Jmake: Dependable Compilation for Kernel Janitors – Julia Lawall, Inria

Software gets bigger and bigger, this is certainly the case for Linux as well. In addition, Linux is configurable so not all code is built. Different kinds of developers are involved: casual contributors, maintainers and janitors.

Janitors clean up other people’s messes. They know coding style conventions and API changes. However, they don’t know the subsystem they affect deeply, and often they don’t have the possibilities to test well. There is a risk of a silent compiler failure: the janitor modifies some code, compilation succeeds, but the compiler actually didn’t build the modified code because it was configured out.

JMake handles the silent compiler failure to improve the reliability of janitor code. This grows out the coccinelle work: when doing this kind of tree-wide change, it is hard to make sure that you’re actually testing what you changed. People want to have immediate feedback of what they do, so an online tool that sends a mail is not appropriate. Even under allyesconfig, some parts are not built.

JMake looks at a diff and what gets built, and reports any modified line that did not get built. JMake can also find that the line can be built by compiling allyesconfig for a different architecture.

Tools available to JMake: make.cross and allyesconfig, in-tree defconfigs. Trying all that would take too long, so JMake has heuristics. For files in arch/, allyesconfig is used. For drivers etc. it is alsways x86/allyesconfig. If that fails, look in the Makefile if there is a CONFIG variable associated with the C file. A final heuristic is to use the same arch as for the other lines in the patch.

Extra challenge for .h file, because you don’t know where it will be used. Additional complication that a header file with the same name is often used for different arches. Also conditional complication is especially heavily used for .h files, i.e. not including unless some symbol is defined.

To find out which lines are actually compiled, you could look at the line numbers in the compiled code (.lst file) (but this doesn’t work for macros); introduce a syntax error in the modified line and check that an error is reported (but not reliable and compiler-specific); mutate source code and verify that the mutation is in the .i file (preprocessed code), and if yes also verify that the (unmodified) source file actually gets built. The mutation is done by adding a string (which never gets modified by preprocessor) surrounded by characters that are not valid in C. If the mutation is in a macro, it will end up on a different line but it will end up somewhere.

To run jmake, you give it a commit ID or range of commit IDs to look at. It then goes through the above process. It looks at changes in blocks that are not interrupted by #ifdef. It reports for each file how it managed to build the modification: “make” = x86/allyesconfig; “make.cross ARCH=…” = ARCH/allyesconfig; “make.cross ARCH=foo:bar_defconfig” = with a different defconfig than allyesconfig; Failure = needs to be looked at manually. For a commit that affects 83 files it took about 8.5 minutes on Julia’s laptop.

Julia ran it on 11K commits. 96% of the modified non-arch files are visible in x86/allyesconfig. 365 .c files and 75 .h files outside of arch are not visible in x86 but are in some other arch, typically arch. 415 of .c files doe compile but not all modified lines are compiled; 54 of these can be found in other arches, but 361 cases JMake fails.

Some issues:

  • Config options that are never set.
  • Changes that are done both in #ifdef and #else can never work.
  • Changes in #ifdef MODULE because not testing modules.

Julia made an objective definition of what a janitor is based on some metrics about the type of commits people make. Basically a janitor makes changes in a lot of different files, in different subsystems. She detected 21 janitor commits that do not get built on x86/allyesconfig.

The tool works well when you’re reacting to dependencies, e.g. adding new arguments to a function. It does not work when you create dependencies, e.g. adding const to a declaration – the latter would need to build all the users of that function, but JMake doesn’t do that.

 

Advertisements

Protecting Your System from the Scum of the Universe – Gilad Ben-Yossef, Arm Holdings

Gilad is the maintainer of the ARM TrustZone CryptoCell Linux device driver.

Smart devices are used for everything, so we need to be able to trust them. However, we also want a frictionless user experience and be able to do anything with it. This is guaranteed to fail, so we need a second line of defence, a trusted way of failing. If someone gets hold of our device, we don’t want them to have access to all our secrets, to get access to additional resources, and we want to know about it and be able to get them out again. We want trusted boot: reboot the device and it is safe again.

All the components are there, we have to make them fit together.

Secure boot (Android style, but others are similar): chain of trust through the boot process, each component verifies the next one. ROM uses a public key in e.g. eFuses to verify bootloader. Bootloader verifies kernel and boot fs. OS verifies the full rootfs. Root key can also be in flash with just a hash in eFuses, or it can be a certificate chain with just hte hash of the root in eFuses.

Checking rootfs is done with DM-verity. It prevents a persistent rootkit: if the persistent storage is changed, we will know. DM-verity adds hashes and signatures to a readonly filesystem using device-mapper. Check is done every time we access the filesystem, not at boot. It uses a Merkle tree of hashes of blocks to arrive at a root hash that can be verified through a signature. The Merkle tree is stored on the device, so we need to verify log4096(device size) hashes. Cfr. figure in the slides.

This works only for readonly devices: when a block changes, the entire Merkle tree changes. For read-write data, the simplest is using full-disk encryption (dm-crypt) which implicitly does authentication. dm-crypt is a device-mapper layer between the actual filesystem (e.g. ext4) and the block device (e.g. eMMC) so neither of these knows about the encryption. This uses a single key for everything accessing the device, and the key is kept in memory all the time. The key is password-protected.

Problem with whole-disk encryption: multiple users, not possible to avoid encryption for some use cases. For example, alarm clock app is in encrypted storage, if the device reboots during the night, you have to give the password before the alarm clock can start running…. fscrypt solves this by pushing encryption into the fs layer, which allows different or no encryption keys for different directories and files. So e.g. the alarm clock app may be encrypted with a key that is stored in the rootfs, while the sensitive information is encrypted with a user-provided password. Limitation of fscypt: doesn’t hide all metadata, e.g. file size is not encrypted. Multiple keys can be loaded separately into the kernel. When the key is available in the kernel, you can see the file. When the key is not loaded, you can see there is a file but not its name or content.

The problem is the key: it has to be put in the kernel and stay there, so it is vulnerable it the kernel is compromised. Solution is some trusted execution environment, e.g. TrustZone in ARM. TrustZone is a hypervisor mode (called TEE, Trusted Execution Environment) that has access to memory that the normal OS has no access to. The OS then asks the TEE to store the key in memory that is not accessible to the kernel. It is never possible to get it out again; to do encryption, the kernel asks the TEE to put the key in a hardware crypto engine.

Instead of a TEE, you can also use a Trusted Platform Module (TPM) discrete from the CPU. Keys are directly stored in there and never go to flash; they are even generated in the TPM so they really never ever go to memory. But of course the TPM can still have bugs that can be exploited. The TPM can also do attestation: give access to certificates only if a certain set of hashes (of the HW and SW state) is provided in a certain order. This is done with Integrity Measurement Architecture (IMA) subsystem in Linux. Attestation is a way to check a sequence of hashes without needing to store all the hashes.

 

 

 

 

 

 

 

 

Introducing the “Lab in a Box” Concept – Patrick Titiano & Kevin Hilman, BayLibre

BayLibre does HW and SW support for embedded (Linux) systems, and also does kernelci.

Lab in a box = PC with stuff to connect a board and running LAVA.

KernelCI is a distributed test farm on 250 boards doing 2700 boot-tests per day. Pulls from various trees, builds various configs, distributes them to the boards, and sends the results to the relevant mailing lists. The build servers that pull the git repos and build the kernels are centralized, the boards are distributed in a dozen labs. The Lab in a Box is an easy-to-set-up lab. Also AGL uses the KernelCI tools but with a different centralized test master.

First reason for Lab in a box is to clean up the cable mess and shelves. Provide something that is easily maintainable, shareable and easily duplicated. Also simplify the administration (i.e. deploy LAVA, now easier though through the use of Docker, simplify device description, simplify which tty device is which board) by adding a web administration & control panel. Ultimately, accelerate deployment. Everything in one case, including the software. But still relatively low cost.

Challengees:

  • A lot of stuff in one box, not much space.
  • Power control, different boards have different supply requirements.
  • Make it easy to install a board so you don’t have to spend a day to install a board.

The box is a normal PC tower with

  • Celeron quadcore with 8GB RAM and 120GB SSD – required for LAVA.
  • Plenty of fans in case cooling is needed.
  • ATX power supply also powers the boards – it provides 5V and 12V. This saves on power wiring.
  • Home made power measurement and control board called ACME cape on BBB.
  • USB hub for network consoles + FTDI USB serial cables.
  • Network switch. Each device on a separate LAN. Lavabox itself needs internet access to connect to kernelCI.
  • 6 DUT: RPi3, BBB, Le Potato, DragonBoard, R-Car M3, SABRELight. They are installed in drive bays, so easy to insert and remove.

BOM cost (ex. DUTs) about 400 EUR, but you can use components that you already have to reduce costs.

For DragonBoard, it’s fastboot so extra USB cable to drive fastboot. Some don’t have Ethernet so instead use NCM gadget or USB storage. Some devices are powered over USB.

LAVA slave provides the DHCP, TFTP, NFS, … It also knows (through config) the USB-serial ports to the boards (using udev rules, FTDI cables have a unique ID). Manages update through fastboot or USB storage where needed. lavapdu-daemon controls power of each board, with backends for various PDUs including BBB-ACME.

LAVA master schedules the tests. It also has the descriptions of the boards and gives them to the slaves (even though they are no use to the master, would be more logical to put this on the slaves). Board has a device-type (e.g. BBB, includes how to boot and give it a kernel) and device (instance, specifies which ports etc. to use). Written in jinja2 templating language, which makes it very powerful e.g. can set up custom things for a specific device.

squid proxy to avoid downloading the same kernel from kernelci.org over and over again.

All the pieces are in docker containers and combined with docker-compose.yml file.

Lab in a Box is an example, you can build your own and replace any of the components. The SW doesn’t depend on the specific HW, thanks to the LAVA abstraction layers that put everything in config files.

Achievements:

  • Less of a mess, can all fit in a nice box. Fits comfortably in an appartment.
  • Integrated SW and easy administration (still under development).
  • Good demonstrator to evangelise CI.
  • Easy to replace DUTs.

Limitations:

  • Tedious to build the PC case. Needs drilling and soldering.
  • Pretty densely packed so not easy to build.
  • Limit on DUT size to fit in a drive bay.
  • Only 5V and 12V DUTs, and must be balanced across ATX outputs. Can be solved with a higher-power ATX supply that has more rails.
  • No button presses for boards that need that, need separate relay for it.
  • Even with a larger case, it wouldn’t be possible to add more devices: wiring is the limitation.
  • Does not really scale to installations with dozens of boards. Develop a rack-mounted solution for that.
  • No standard DUT connector for the boards, it’s always custom wiring. Should work with board manufacturers to develop a standard connector.
  • Too complex and expensive for a 1-board lab. Build mini version for this use case.
  • Administrative control panel doesn’t exist yet – currently a YAML file.
  • No documentation (yet).

Competing project: a nanoPi hat that connects to a single DUT, that will be open hardware. Published on Tizen wiki.

BoF: Collaborating to Create the Secure OTA Update Systems for Linux – Alan Bennet & Ricardo Salveti, Open Source Foundries

There are too many open source OTA implementations doing similar things. There are a lot of pieces that could be shared, especially on the security side to make sure we get it right.

Requirements

  • Atomic updates. Also fast, the system can’t be offline for a long time. Easier if you have a readonly rootfs.
  • All pieces must be updateable, including bootloader.
  • Failsafe/rollback, using a watchdog. Necessary because you may have different versions of the hardware.
  • Verification of the image (incl. signing).
  • No vendor lock-in.
  • Trusted boot.

Two basic modes: block based (= A/B bank) which usually implies full update (but can also do a diff update). Examples: swupdate, mender, rauc, resinos. File-based update doesn’t overwrite a partition but individual files. Could still be multiple partitions, or overlays. Server side is more complex because it needs to calculate what needs to be updated. Examples: OSTree (used in several projects, e.g. Project Atomic, flatpak, AGL), swup (Intel-specific).

Trusted boot is still problematic. It is hardware specific, TEE is not widely used.

Secure software distribution is also not solved. HTTPS is obviously not enough, even if you check the HTTPS certificate. E.g. downgrade attack: send a valid, signed but vulnerable old version of the software to devices. There is a specification (based on Tor): The Update Framework (TUF) that enumerates what should be checked. E.g. docker and pip implement this.

There was no time for discussion, Ricardo used all the time for the introductory presentation. However, there is a wiki page https://elinux.org/Secure_OTA_Update

Automation beyond Testing and Embedded System Validation – Jan Luebbe, Pengutronix

Pengutronix builds embedded linux systems for customers, everything below the application. In addition to the kernel, that includes mesa, wayland, Qt, chromium, gstreamer, …. All that changes all the time and sometimes breaks. This kind of testing is “solved” by Jenkins and Lava.

Continue reading

Low Level Sensor Programing and Security Enforcement with MRAA – Brendan Le Foll, Intel Corporation

mraa.io is a simple userspace I/O protocol to unify a plethora of interfaces: UART, GPIO, I2C, ADCs (IIO), 1wire, …. MRAA is the API spec, libmraa is the C/C++ implementation. Also bindings for python, nodejs, java, and unsupported bindings for a bunch of other languages (e.g. lua). Made for monkeys, so easier is better. On Linux, MRAA brings the I/O that is typically reserved for the kernel available to userspace. It’s mainly for quick prototyping, but turns out to be used in actual products. Platform quirks are abstracted, supports lots of devboards. E.g. it does the pinmuxing if necessary. Sometimes even uses devmem if the crappy vendor kernel doesn’t allow things to be done properly.

Most calls are syncrhonous.

GPIO interface allows to register ISR with a callbac function.,

On top of this API, a sensor library has been added: UPM (Useful Plugins for MRAA). It gives code examples of how to use each sensor..

To add a board, there are 3 ways:

  • Raw mode: no platform definition, just map the pins to the kernel representation e.g. gpio numbers.
  • C platform configuration: same kind of mapping, but also override things where necessary.
  • JSON file is similar to raw mode but you can give names etc., just no overrides.

To do things like devmem manipulations safely, there is a daemon that checks permissions.

On Android, there is a peripheralmanager that authorizes access to GPIOs etc. This was reused to support MRAA and a bakcend was added to libmraa that talks to peripheralmanager over Binder. This way, all the sensors become available on Android.

AFB is the equivalent of Binder in Automotive Grade Linux. Every application has a SMACK security context, and a binder in the same security context. The binder exposes the bindings that the application has access to (and only those). AFB doesn’t require the rest of AGL. To use MRAA with this, there is a global libmraa that actually talks to the kernel, and another libmraa in each application that talks to the binder which talks to the global libmraa. This way, each application can only access the messages that were meant for it. The two libmraas are in fact built differently. The application libmraa is built with BUILDARCH=AFB, which replaces all the normal kernel calls with calls to AFB’s binder. In a similar way, it is possible to build a libmraa that uses I/Os that are not directly accessible by the kernel, e.g. an extension board connected over UART.

 

 

An Overview of the Linux Kernel Crypto Subsystem – Boris Brezillon, Free Electrons

Boris gave an intro to crypto which I will not summarize here. See also https://youtu.be/dnGbhvweNb8

Crypto = transforming input data into something else. The implementation is the algorithms, the object is an instance that you can use to execute the algorithm and that contains state; it is called tfm. Algorithms: cipher, hash, AEAD (called authenc in the kernel), HMAC and compression. Algorithms are combined, e.g. hmac(sha1) or authenc(hmac(sha1),cbc(aes)). How to code it: allocate the algorithm tfm with cypto_alloc_<algtype>. set callbacks, set context (e.g. key, flags), feed in data with _request_set_crypt (= pass in data) + crypto_<type>_<operation> to execute it, finally free the request and the algorithm tfm. The API is asynchronous. Thypically the encrypt operation returns -EINPROGRESS or -EBUSY and you wait for a completion which is done in the callback set before.

To use kernel crypto from userspace, there are two competing solution: the out-of-tree cryptodev and the mainlined AF_ALG. cryptodev is taken from OpenBSD. It creates a device node that is accessed with ioctls. OpenSSL supports this type. AF_ALG uses a netlink socket, can be added to OpenSSL with an out-of-tree OpenSSL module. Most userspace programs don’t use AF_ALG. Boris did speed experiments with the Marvell CESA he implemented; for small blocks, they are more or less equal; for larger blocks, cryptodev is slightly faster. However, a software implementation is even faster and doesn’t take so much more CPU power. With 128 threads in parallel, AF_ALG is a bit faster. If energy consumption is important, that could change the conclusion again. But the conclusion is: if you need to choose between cryptodev or AF_ALG, perhaps it’s better not use anything at all. Better run some benchmarks.

The crypto API doesn’t distinguish between hardware or software implementations. So you register the crypto_alg subclass with the types of algorithms that are supported. Each algorithm that the engine supports is registered separately with a different name, elg. “cbc(aes)” and “ecb(aes)”. There is also a driver-name that allows selecting that specific implementation of the same algorithm. A priority constant is used for automatic selection of the implementation. Various flags can be set, e.g. that it’s asynchronous.

When the crypto engine allocates a new tfm, the driver-specific buffer is also allocated by it and passed to the init function. The implementation must also implement setkey, encrypt end decrypt functions.

Because the algorithm is passed as a string, it is quite easy to add a new algorithm to the framework. But that makes the framework complex. Fortunately there is an extensive test suite that can be used to test a new driver. However, often there are several ways to implement the same thing (by composing in a different way). The way that subclassing is done is not consistent. The framework evolves and old drivers don’t use the new features, which makes it difficult to find the current best practices. Important details are sometimes hard to discover, e.g. completion callback should be done with softirq disabled.

There is no way to do NAPI-style polling under heavy load, a driver that is async will always have to be based on interrupts. So using this for doing network encryption defeats the purpose of NAPI. Boris proposes to add a NAPI-like driver interface to the crypto subsystem.

The priority-based automatic selection will always select the same driver, so if you have two hardware crypto engines, only one of them will be used: the one with the highest priority, or the first one of equal priority. There should be load balancing, but the framework is not designed for it at all. To do that, we’d need a way to define occupation of a crypto engine and an estimate of the load (e.g. length of the request). When switching engine, the context also has to migrate. Boris proposes to do the load balancer at the driver level, i.e. you register all the engines that can be used interchangeably in a common load balancer, which itself will expose the crypto API.

Question from the audience: shouldn’t there be an interface that the crypto user can use to allocate memory, so it can allocate the buffers in a way that the driver can access it directly – some hardware will have specific restrictions on the buffer layout (e.g. no scattter-gather), requiring a memcopy if this is not the case.