Soft-Decoding in LDPC based Next-Generation SSD Controllers
Author: Stephen Bates
Introduction
In my last post, I talked about how we can control the parameters of Low-Density Parity-Check (LDPC) error correction codes in order to manage the latency associated with reads from a Solid-State Drive (SSD). However, we only looked at the iterations associated with a single decode of the LDPC codeword. In this post, we will take a look at what happens when that initial decode fails and how soft-information can be used to recover the data on the SSD.
Hard Data Decoding and Soft Data Decoding
In October 1948, Claude Shannon published his seminal paper “A Mathematical Theory of Communication,” which kick-started the discipline of information theory. The work in this paper is still used today to determine how good or bad an error correction code is because it defined a performance bound beyond which no error correction code can go. Interestingly for any given channel there exists two bounds, one for decoding using hard data and one for decoding using soft data. A few examples of this are given in Table 1 below.
When we perform a read from NAND flash we typically only get back the ones and zeros associated with the data on the flash. As such, an initial LDPC decode of that data is a decode based on hard data since it has no knowledge about which of those ones and zeros are good and which are dubious (there are a few tricks that people play here but for now let’s assume my statement is true). In this case, the performance of that initial decode is limited to the second row in Table 1.
LDPC and Soft-Data Decoding in NAND Flash
If we assume that a hard-data LDPC decode fails, then things start to get very interesting for the SSD. We could decide to return an “Unrecoverable Read Error” and tell the user that their data is lost forever, however end-users typically don’t like that ;-). If the SSD has an internal RAID system, we could attempt to recover the user’s data using this. However, with LDPC, there is a third option, which is to soften the data and attempt a soft-data LDPC decode. Note that this third option is not available in controllers that use less advanced error correction (for example BCH codes) because they cannot leverage soft information. This option allows us to move from the second column in Table 1 to the third column and hence operate in more noisy environments.
I like to think of a soft-data LDPC decode in three parts:
- A re-read strategy.
- Soft-data construction.
- The soft LPDC decode.
Let’s look at each one of these in turn.
The Re-Read Strategy
The Re-Read Strategy consists of reading one or more sections of the flash in such a way as to assist in the construction of soft data. There are a lot of different options here, both in terms of which section of the flash to read and in terms of how those reads are performed. For those of you with an information theory background, what we are trying to do is maximize the mutual information of the read and the original hard data in order to generate the best soft data we possibly can. Some examples of a Re-Read Strategy might include:
- Read the same section as the original hard data but using a different set of read threshold voltages inside the NAND.
- In MLC NAND, read the section that shares the same word-line as the original section.
- Read the section that corresponds to the dominant disturber. This is the section that, when programmed, has the strongest program disturb impact on the original hard-data section.
There are pros and cons to each of the three Re-Read Strategies outlined above and, in fact, the three can even be combined together if desired. Just remember that each time you read from the flash you will incur more latency!
The Soft-Data Construction
Each of the reads in our Re-Read strategy returns the 0 and 1 data associated with that read. Although slightly more advanced multi-bit reads might exist in more advanced NAND, we will ignore that for the purposes of this blog. Therefore each read of the flash gives us one more bit of physical information, however we need to map these physical zeros and ones into soft information for the LDPC decode. This mapping requires an understanding of both the NAND flash and the LDPC codes that are being used.
We’ll illustrate what we mean by a very simple example. For a hard-data decode we can use a very simple mapping to convert the zeros and ones from the flash into information the LDPC decoder can consume. We call the output of this mapping Log-Likelihood Ratios (LLRs). This mapping is given in Table 2.
Now assume our Re-Read strategy consists of one additional read. Our soft-data construction might look something like as shown in Table 3.
The Soft LDPC Decode
The final step in the soft-data decode involves passing the LLRs for each of the bits of the codeword into the LDPC decoder logic. The hope is that this decode will be more successful than the original hard-data decode was and the SSD will now be able to return the user data and perhaps move that data to a safer region of the SSD so that it can be recovered more easily the next time it is requested.
LDPC White Paper
In my last few blog posts I have been discussing the various ways that LDPC can impact how we use NAND flash in SSDs. We have a more detailed white paper for those of you who are interested in a deeper level of information. Feel free to download that paper, and please let us know what you think!
Coming Soon
This concludes my blog series on LDPC codes. Stay tuned for a new series that looks at how we can use PCIe-attached NVM devices to do some interesting things in the data center. We will present results related to this and several other topics at Flash Memory Summit 2014, so if you are there please come and check us out at the technical sessions or in booth #416!
Read Part 1 of the LDPC series: Transitioning SSDs to LDPC Error Correction Codes
Read Part 2 of the LDPC series: Latency in LDPC-based Next-Generation SSD Controllers
Leave a Reply
You must be logged in to post a comment.