Modernizing the NAND Framework: The Big Picture – Boris Brezillon, Free Electrons

Boris is working on various aspects of MTD (NAND drivers, UBI, UBIFS) and is reworking some of the infrastructure. With this talk, he wants to explain his plans and get feedback on them.

The current NAND framework was created a long time ago (2.4.6). It was meant to support the first NAND chips and very simple NAND controllers. New features have been added all the time, but no bit rethinking. For example, there is really a lot of code duplication. This is exacerbated by the openness of the subsystem, which allows drivers to do the same thing in different ways that makes it difficult to refactor things. It is also missing some functionality required for MLC flashes.

For each new feature, a new hook was added to struct nand_chip. This makes is difficult to understand the code for newcomers, even if it is documented. It’s not clear which functions should be implemented. Some of them have default implementations if the controller doesn’t provide any, but that’s not always clear. Also some of the core methods can be overridden by drivers, which decreases consistency.

Most current NAND peripherals can pipeline accesses, but the NAND framework serializes everything. This is not easy to change. There was a feature added to use cached program operations during sequential write access, but it was disabled because there was no gain at the time. Boris re-enabled it and it gave 20% speed-up on a DDR NAND, so this is encouraging.

First problem is that there is no clear separation between the NAND chip and the NAND controller – the controller driver creates both. A first step is to move a lot of the hooks in struct nand_chip to struct nand_controller, possibly rethinking some of them in the process. It’s important to carefully decide which things belong to the chip and which to the controller. Finally, NAND chip registration can be automated outside of the specific controller, so it is clearly separated. There should be a 1-N relation between controller and chip.

Example of wrong design: nand_chip->cmdfunc(). This is the method to send a command to a chip. A NAND command typically consists of two commands: a first command to give the address, a second command for the data. Or for write page, it is command 0x80, followed by the address, followed by the data, followed by 0x10. cmdfunc() only takes care of the command and address cycles, not of the data transfer itself, there are separate read/write_buf/byte/word hooks for those. Also, the core provides a default implementation that used cmd_ctrl() to send single command and address cycles. So cmdfunc() only needs to be overridden for controllers that can handle full NAND operations. But this doesn’t work, because cmdfunc() doesn’t know how many bytes will be transferred so the controller can’t be set up correctly. Also the operation is identified by parsing the command given, which means that all cmdfunc() implementations have to be modified to support a new command in a chip. That makes it difficult to support features offered by either the NAND chip or the NAND controller.

The proposal is to introduce a new method on nand_controller that does the entire operation, including I/O. But that doesn’t handle the case where a NAND controller offers specific registers for high-level operations.

The NAND core tries to be smart so the controllers don’t have to implement everything, but sometimes the core takes the wrong decision which leads to weird behaviour. Instead of trying to guess what the controller wants, the core should rely on helpers from the controller: the core gives default implementations, but the controller has to use them explicitly. If a method is not filled in, it will not be called and return -ENOTSUPP.

Some things that already happened: helpers to fix bitflips in erased pages (controller has to explicitly call them); common DT parsing; common setting of timings.

Currently the NAND layer and the MTD layer are quite mixed up. An mtd_device and a nand_device are essentially the same thing, but both are used intermingled. The idea is that it will be possible to retrieve the mtd_device from the nand_device object, so that the mtd_device doesn’t need to be passed anymore. Further, use of mtd_device should be avoided to separate the two layers.

What is needed to improve NAND performance

Simple case is single single-die chip connected to controller. Second case if a multi-die chip. Third case is multiple (single- or multi-die) chips. The second case is actually almost the same as the third case. Modern NAND controllers can make use of this by pipelining NAND accesses, interleaving operations between dies/chips, etc. However, the framework will lock the controller during e.g. a read() operation so no interleaved access to a different die/chip is possible. The idea is to queue requests to the NAND controller, and let the controller dispatch requests as needed and as possible.

Also at the chip level optimisations are possible: cached access, multi-plane access, multi-die access, DDR. These require support both in the chip and in the controller. A first step is to identify which features are available. ONFI and JEDEC standardise discovery, but it has to be represented in nand_chip.

Cached access: allow access to the cells to happen in parallel with the transfer over the bus. This helps a lot on MLC, because pages are large and both I/O and ECC calculation take a lot of time.

I/O scheduling: dispatch accesses to different dies in parallel. Contention on the bus has to be resolved, but there are long idle times on the bus so it is possible. The scheduling algorithm should be generic, but there are controller-specific bits to actually allow and control interleaving.

A lot of modern NANDs don’t comply with ONFI/JEDEC. There are additional features that are not usable through the normal commands. Currently, chip detection is in nand_base.c, it would be better to isolate things in vendor-specific c files, which fill in some of the nand_chip hooks.

Sharing code between NAND-based devices

spinand is a framework separate from nand (it’s also mtd, but a different lower layer). A lot of the code is similar, however, e.g. bad block management, MLC constraints, memory array organisation, etc. Factorising these commonalities sounds like a good idea. This is already started by Brian Norris and Peter Pan. Extended by Boris: Create struct nand_device, make nand_chip, onenand_chip and spinand_chip derive from it, common code uses nand_device.

Help wanted

Share your ideas or your problems, or propose an implementation. Test the proposed patches on your controllers/chips. At the driver level: convert your driver(s) to the new infrastructures – Boris can’t patch all the drivers. Review submissions from others.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s