How I survived to a SoC with a terrible Linux BSP – Luca Ceresoli

Luca talks about his own experience with an embedded Linux system based on the very cheap Nuvoton N32926 (ARM926EJ-S@240MHz + H.264 + 64MB DDR2 on-package). It comes with a Linux BSP, but it sucks.

The ideal BSP has a mainline kernel, mainline U-Boot or Barebox, and good hardware documentation. This would give you established quality and community support, and a way forward to new products with different hardware. But this is not reality.

It starts with documentation. The Nuvoton website has a “datasheet” of 8 pages. When you buy the chip you can get documentation under NDA, so not something you can work with. You can also buy a devkit which has a DVDROM with software and some documentation, including the “design guide” which has a list of registers for several peripherals. That is sufficient if you already known other SoCs so you can guess what it all means. Fortunately, the hardware works quite well so no need to find out about idiosyncrasies or errata.

The BSP is based on Linux 2.6.35.4. That is not even the latest stable of that branch, it misses 11 months of bugfixes. Fortunately merging into 2.6.35.14 is possible with minimal conflicts. However, compared to 4.9 the difference is huge, including the introduction of device tree. The vendor tree is provided as 3 patch files with no explanation of why things are done, changing a total of 170K lines. And these patches contain bugs, e.g. the H.264 decoder crashes if there is packet loss in the stream. There are also features missing, e.g. GPIOs don’t have interrupt support. Power management is implemented using a proprietary API and doesn’t really work. And just looking at the code shows the horrible code quality. E.g. there are 521 #if 0 lines. In many places the driver models are not followed or proprietary APIs are invented. The result is not modular and full of hacks, e.g. allocating the H.264 decoder buffer is done in early boot, so it cannot be done in a module.

The BSP also contains a toolchain: gcc 4.2.1 (not even 4.2.4…), uClibc 0.9.29. That can be replaced relatively easily, but the toolchain needs 2.6.35 headers so prebuilt toolchains probably won’t work. So build a custom toolchain with crosstool-NG, Buildroot or OpenEmbedded.

Booting is pretty critical, but the BSP doesn’t contain an open source bootloader. It contains source code for a proprietary bootloader. It has a fairly unusual boot sequence. The boot ROM only initialises NAND (not DDR2 even though it’s in-package). It runs the NAND loader that initialises external memory and continues loading from NAND. That is NVT loader, which mounts a FAT partition on the NAND flash (with a FAT-on-NAND FTL which has a binary-only Linux module) that contains “conprog.bin” which is actually a zImage with embedded initramfs. Also when a specific GPIO is kept low, NVT will expose the FAT partition as a USB mass storage device. This allows the vendor to very easily distribute demos. There is no way to unbrick a device if NVT loader is broken. There is no TFTP boot and no cmdline args (so NFS boot requires writing a different kernel with different hardcoded cmdline). The entire rootfs must be in initramfs, therefore it must fit in RAM and is not writable.

To improve, a first step is to use a squashfs as rootfs so it doesn’t have to stay fully in memory. From Linux, we can also easily use UBI – but this requires a switch_root from the initramfs, and NVT loader must be tweaked to reduce the FAT partition size. Upgrading via USB storage is not possible anymore.

But NVT loader can be bypassed completely, instead load a Image file from NAND loader. No need for a FAT partition anymore. To be able to upgrade, the loader kernel is never updated; instead it continues booting into another kernel with kexec.

Porting U-Boot would be a next step, but it is more work. It has the advantage that it allows you to use the same update and development tools as for other products. But that’s not enough to spend the time.

Many SoC vendors have a protocol in boot ROM that requires a proprietary tool to flash an empty NAND. This is typically a closed-source Windows tool that is not scriptable. The Nuvoton tool has quite a good design, a lot can be done from this tool. But the protocol is not documented so it’s not possible to write your own tool. The tool uses a partition table in the NAND flash, which is used to communicate to NAND Loader and NVT loader how the NAND is partitioned. Nice feature but other tools (e.g. kernel) don’t know about it.

Because there is so much proprietary code (with sources), there aren’t many options for community or commercial support. Only the vendor can provide customer support. But the engineers that know about the chip are hidden behind several layers of resellers, FAEs etc. In addition, the time zone gives long latencies. Sometimes the issues don’t get solved even after several (1-day latency) e-mail iterations.

Compared to a well-supported SoC, this BSP leads to:

  • a lower quality product (known bugs, code is hard to trust);
  • extra time spent in development.

What can you do to improve things?

  • Assess and understand these issues as soon as possible, to influence the HW choice. Check if the chip is supported in upstream.
  • As a hobbyists, pick boards with good mainline support or mainline existing support.
  • We should make it clear to chip marketing that a “mainline Linux support” label is valuable.

What can vendors do?

  • Make a better BSP, because happy engineers leads to more sales.
  • It takes less time for you AND for your customers if you don’t reinvent the wheel but use established practices.
  • Document things and don’t hide behind NDA. This allows…
  • … mainlining support for your product, which gives your customers free support.
  • Make hacker-friendly boards.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s