MLC NAND flashes bring a lot of new problems that have to be solved in the kernel, in the MTD layer, in UBI, in UBIFS.
New constraints in MLC NAND: reduced P/E cycles (5000), program pages in ascending order, data retention is much shorter (SLC NANDs too but less bad), paired pages, unstable bits.
Reduced P/E cycles is not a big issues, just reduce the maximum EC imbalance in UBI.
Data retention issues come from two sources: read/write disturbance, and inherent charge loss. The latter becomes much worse with increasing P/E cycles. Solutions: increase ECC strength, so errors are recoverable. If bitflips are found, the block will be re-written and retention is recovered. But unrecoverable errors are still possible.
The bits in a MLC cell are assigned to different pages. Therefore, if there is a powercut while programming the second page in the same cell, you may correct the already-programmed page. In addition, pages are not paired simply N to N+1, instead it is N with N+3 except for the first and last page.
Unstable bits may occur when a program or erase operation is powercut when it is almost finished. Then you may read correct data the first time, but the second time it’s full of uncorrectable errors. Not seen in tests, but reported in the mailing list. Boris thinks that you will actually still see a lot of correctable bitflips when you can indeed read the data back, so in practice with UBI you won’t see the problem because it will go into scrubbing when there are bitflips.
The last written page in MLC flash seems to show a lot of bitflips, which disappear when you write the next page. The bitflips are correctable, but because UBI sees bitflips it will start scrubbing which is not good.
The good news is that all of the problems go away if you only use the lower pages, but then you have only half the flash capacity.
UBI terminology: PEB = actual eraseblock in flash; LEB = (smaller) eraseblock as seen by the upper layer, can be mapped anywhere in flash; image = UBI on your MTD partition; device = runtime representation (/dev/ubiN); volume = device that is usable by upper layer (/dev/ubiN_M); attach = process of creating the UBI device for an MTD partition.
On flash, UBI adds an Erase Counter and Volume ID header to each eraseblock. Writing full LEBs can be done atomically (“atomic LEB change”), so it is powercut safe. In this case, it’s copy-on-write: first the new block is written with a sequence counter and CRC, only afterwards the old version is erased. If the power is cut in between, both are present, but either the new one is present and has a higher sequence number, or the new CRC is incorrect and the old one will be used. Erase Counter is written immediately after erasing, the counter itself is incremented. The Volume ID header is written as soon as the PEB is assigned (=mapped) to a LEB of a volume.
First page of each eraseblock contains EC. If subpages are supported, VID will be put in second sub-page of first page, otherwise it’s on the second page. The other pages are used for payload.
Data retention issues can be seen really quickly on MLC. Read disturb is visible sometimes after 500 reads of a page (vs. 100K on SLC). Scrubbing is done if bitflips are encountered. So the whole flash should be regularly read to detect those bitflips. It is sufficient to do dd if=/dev/ubi0_X of=/dev/null for every volume regularly, but “regularly” would be too frequent for MLC, in addition these reads would actually create new read disturbt. ubihealthd is a daemon that monitors read counters and triggers a read of the whole PEB if some pages have been read but some others haven’t been read for a while. Read counters are stored in memory. ubihealthd writes them to flash with a certain frequency – it doesn’t matter if the counters are inexact.
For paired pages, the page pairing scheme used by the flash chip should be added to the DT. This will be in 4.9. But then this has to be used.
UBIFS could write everything in SLC mode (using only lower pages) and switch to MLC mode when it runs out of space. This works because it knows which data is valid and which data is dirty, and when a sync or commit is done. However this becomes very complicated, because UBIFS has to work with two possible LEB sizes which changes everything.
So instead implement it in UBI. By default, every PEB is written in safe mode (using only lower pages). When UBI runs out of space, consolidate two fully written LEBs into a single fully written PEB. This gains one more available LEB. All pages will be available then. However, this creates more states, because a consolidated LEB may be unmapped, so the PEB will be half-full and half-invalid.
New terms: SLC LEB = LEB on lower page; Full LEB = SLC LEB where the last page is written (= completely written though it is not necessary to actually write the inbetween pages, they are anyway not usable anymore).
Consolidation should be started earlier than when no PEBs are available anymore, so low and high watermark for consolidation.
On MLC there are no subpages, so EC will always be on page 0 and VID on page 1. Page 1 is always also a lower page. In MLC mode (consolidated), there is only one EC, and page 1 contains both VID headers.
In MLC, there will be twice as many LEBs as PEBs. This leads to confusion, because sometimes “number of LEBs” is used where it should actually be PEBs (in documentation, comments, variable names).
Space reservation for housekeeping in UBI acts on PEBs, so the available LEBs is smaller than 2*nPEBs. Also, if you write only the one page to every LEB, they can’t be consolidated. In fact you can never actually use 2*nPEBs for any writable volume.
Zombie LEB: if a consolidated LEB is overwritten, a new copy is created and the consolidated one becomes invalid. This is only visible in the difference in sequence numbers. When later on the LEB is unmapped, it will be really removed from the flash, so it is not visible anymore that it’s invalid. So, either keep track of invalidated LEBs, or force UBI users to use atomic updates instead of unmapping. The latter was chosen because UBIFS is the main user and it could be modified.
To use UBI on MLC NAND, you should:
- Add the pairing information to DT.
- Make sure you have strong ECC.
- Tell ubinize about the pairing scheme (it will consolidate).
- Run ubihealthd.
- Consolidation is slow, because involved LEBs have to be locked which locks down other things as well. Performance impact is rather high, because it’s a lot of data.
- Space reservation is difficult because real nLEBs is unknown.
- Now full LEBs are consolidated, but this may not be the best choice. Maybe an LRU list would be better.
Improvements based on the above lessons are in the works. Especially the performance hit when the unconsolidated LEBs are used up is bad.
Currently Boris and Richard are somewhat on their own in this work. There is no support from NAND vendors, there aren’t many people that can test patches.
Testing is done with nandsim but really on actual devices because they show the actual problems.