MLC SSD card lifetime and write amplification

As MLC-based SSD cards are raising popularity, there is also a raising concern how long it can survive. As we know, a MLC NAND module can handle 5,000-10,000 erasing cycles, after which it gets unusable. And obviously the SSD card based on MLC NAND has a limited lifetime. There is a lot of misconceptions and misunderstanding on how long such card can last, so I want to show some calculation to shed a light on this question.

For base I will take Virident FlashMAX M1400 (1.4TB) card. Virident guarantees 15PB(PB as in petabytes) of writes on this card.
15PB sounds impressive, but how many years it corresponds to ? Of course it depends on your workload, and mainly how write intensive it is. But there are some facts that can help you to estimate.

On Linux you can look into the /proc/diskstats file, which shows something like:

 251       0 vgca0 30273954 0 968968610 416767 122670649 0 8492649856 19260417 0 19677184 220200747

where 8492649856 is the number of sectors written since the reboot (sector is 512 bytes).

Now you can say that we may take /proc/diskstats stats with the 1h interval, and it will show write how many bytes per hour we write, and in such way to calculate the potential lifetime.
This will be only partially correct. There is such factor as Write Amplification, which is very well described on WikiPedia, but basically SSD cards, due an internal organization, write more data than it comes from an application.
Usually the write amplification is equal or very close to 1 (meaning there is no overhead) for sequential writes and it gets a maximum value for fully random writes. This value can be 2 – 5 or more and depends on many factors like the used capacity and the space used for an over-provisioning.

Basically it means you should look into the card statistic to get an exact written bytes.
For Virident FlashMAX it is

vgc-monitor -d /dev/vgca  | grep writes
                                 379835046150144 (379.84TB) (writes)

Having this info let’s take look what a lifetime we can expect under a tpcc-mysql workload.
I put 32 users threads against 5000W dataset (about 500GB of data on the disk) during 1 hour.

After 1 hour, /proc/diskstat shows 984,442,441,728 bytes written, which is 984.44GB and the Virident stat shows 1,125,653,692,416 bytes written, which is 1,125.65GB
It allows us to calculate the write amplification factor, which in our case is
1,125,653,692,416 / 984,442,441,728 = 1.143. This looks very decent, but remember we use only 500GB out of 1400GB, and the factor will grow as we fill out more space.

Please note we put a quite intensive write load during this hour.
MySQL handled 25,000 updates/sec, 20,000 inserts/sec and 1,500 deletes/sec, which corresponds to
write throughput 273.45MB/sec from MySQL to disk.

And it helps to calculate the lifetime of the card if we put such workload 24/7 non-stop.
15PB (of total writes) / 1125.65GB (per hour) = 13,325.634 hours = 555.23 days = 1.52 years

That is under non-stop tpcc-mysql workload we may expect the card will last 1.52 years. However, in real production you do not have an uniform load every hour, so you may base your estimation on daily or weekly stats.

Unfortunately there is no easy way to predict this number until you start workload on the SSD.
You can take look into /proc/diskstat, but
1. There is write amplification factor which you do not know
and 2. A throughput on regular RAID is much less than on SSD and you do not know what your throughput will be when you put workload on SSD.

(via SSD Performance Blog)


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s