While developing software for an embedded system, you want to be sure that you’re going in the right direction and that you don’t break things. Testing the software is the easiest way to get feedback about the code you’ve written. However, the developer has to find a good balance between time spent on testing and time spent on development. As consultants for embedded open source technologies, we at Mind encounter many different approaches to testing with our customers. This article structures these varied experiences and combines best practices, techniques and tools, with a focus on embedded open source software.
I presented this content at the Embedded Linux Conference – Europe, 2010 in Cambridge. Slides are available in PDF and in ODP format. They are , so you can reuse them if you like.
The efficient software developer uses testing
We develop software because we want to make a working product. Therefore, validation is an essential part of the software development process. This validation should be done from a user’s perspective, thus shortly before the release and typically by different people than the developers. That makes the loop back to development very expensive: it has been a long time since you worked on the code so you don’t remember it well (or the original developer already left), it is difficult to pinpoint the cause of a problem because everything is already glued together, and you don’t have much time because the release is due soon. To tighten that loop, the software should be tested as soon as possible, during development and integration. This article focuses on the early testing, done by the software developer.
Loops back to development exist not only because of validation, but also because the software evolves over time: features are added, requirements shift, supporting libraries are upgraded, etc. Thus, after a while you start modifying existing code. Unfortunately, every modification may mean that you break something that used to work. At this point, it is difficult to create tests, since you don’t remember (exactly) what the software is supposed to do and not do. This is why agile methods stress testing so much: in agile methods, modifying existing code is much more important than writing brand new code. Pre-existing, automated tests reduce the threshold to modify code. They have to be automated to some extent, otherwise the threshold to actually run the tests becomes too high.
The key concept of testing for the software developer is thus saving time. We don’t aim for complete coverage of the code or of the features. We also don’t have to write the tests as a black box, without knowing the implementation of the module under test or access to its internals. The latter means that we can write tests based on the behavior of the implementation, and even with the support of the implementation code. For example, we can insert trace and assert code in the implementation, which is used by the test to compare with the expected trace and (lack of) assertions.
Software developers can improve their efficiency by using testing. They should follow these guidelines.
- Make sure there is a test infrastructure from the very start of the project. It doesn’t have to be much, but if nothing is there it becomes increasingly difficult to create the infrastructure when the project grows.
- Make sure that every developer knows how to run the tests. The easiest way to do this is to automate them.
- Make sure the tests run fast. That of course means that it can not be very complete. Complete testing is the responsibility of integration and of validation. Every software developer will run the tests after each change, and certainly before publishing changes to the rest of the team. If it takes a long time to run the tests, they will be delayed, which makes the development loop larger. Also it would delay publishing of changes, which makes the integration loop larger.
- Tailor the tests to your implementation. While developing, you know pretty well where the risks are of doing something wrong. For example, when doing string manipulation in C, the main risk is doing something wrong with the terminating 0 byte. Make a test that checks this specifically.
- Distinguish between specific tests and smoke tests. We only need to test the things we are currently modifying. Modifications can break things in two ways: it can break the existing features of the functionality we’re modifying, or it can break something unrelated (or expose an existing bug). For the first, we just need to test the functionalities that we’re modifying. This typically corresponds to a unit test, but it can be more on the integration level (when modifying the interface between modules, which happens quite often). For breaking unrelated things, those are very often catastrophic (e.g. stack overflow, double free). Therefore, it is often sufficient to check that the system as a whole still works. For embedded systems, it’s usually sufficient to boot a system with all features enabled and check that it still does something.
Embedded testing: test hardware, simulation, timing and updates
Testing for embedded systems is in some ways different than for general-purpose computers. First of all, there is an important hardware dependency, for instance analog audio input, a radio tuner, or a camera. However, the hardware may not be available (e.g. there are only 5 boards for 9 software developers). It is often very resource-constrained and doesn’t have the CPU power, memory or flash space to accommodate test infrastructure. And it’s I/O capabilities are usually rather limited, e.g. lack of writable filesystem for input data or saving traces. These physical limitations can be overcome by stubbing and simulation. Second, it interacts non-trivially with its environment. For instance, a video screen should show the video in real time and degrade gracefully when too many streams are shown simultaneously. These things make up the essential difference between the embedded system and a desktop media player, and are the reason you can’t simply use existing software as is. So these things should also be tested. Finally, updating the software once the embedded system has been sent into the field is completely different from updates of general-purpose computers. On the one hand you have complete control over the whole of the installed software, but on the other hand you can’t assume that the user can take manual action when something goes wrong. Therefore special attention has to be paid to the update procedure.
Test hardware setup
Since the embedded system software depends on the hardware, it is important to have a good setup of test hardware. This should be a concern for the validation team rather than for the developers. However, efficiency can be boosted if the validation team makes test hardware available to the developers as well. A good test hardware setup allows remote control of the I/Os and remote updates of the firmware, so that it can for instance be placed in an oven for testing. An nfsroot is a good solution to allow remote updates. Not just the I/O should be controlled remotely, also power cycling. This makes it possible to test the behavior when faced with sudden power loss.
As an example, consider testing a wireless metering device. The test setup could consist of two of these devices: one with the actual firmware under test, the other is a controller that provides radio input and monitors radio output. Both of them are network-connected to be accessible for testing. Another example is an audio processing board, where the (analog) audio inputs and outputs are connected to a PC that generates sine waves and samples the output.
To be able to perform testing close to the developer, we can perform simulation. The most obvious form of simulation is using a virtual machine, for instance KVM, VirtualBox, or qemu. This allows you to simulate the entire system, including the kernel. It has several disadvantages, though. First, you will probably need to add new peripheral simulators for your particular device, though. Creating such a peripheral simulator correctly can be very tricky. Second, the simulators are not entirely reliable (especially when it comes to peripherals). Thus, you may end up debugging problems which don’t actually occur on the system, but only in the simulator. Finally, simulation carries a speed penalty. For virtual machines (KVM, VirtualBox), the speed penalty is limited to the times when virtualization kicks in, e.g. when serving interrupts or accessing peripherals. It is therefore rather affordable, unless you spend a lot of time interacting with hardware. For emulation (qemu), the penalty kicks in for every instruction. However, since the development server often runs an order of magnitude faster than the target platform, emulation may still turn out to be faster than running it on the actual system.
An alternative approach is to run your application code natively on the development host. In this case, you don’t try to simulate the entire system, but only the (user-space) application code. To make this possible, you need to add a Hardware Abstraction Layer (HAL) to your application, which has a different implementation on the development host and on the target platform. If you heavily use standard libraries, these often already form a HAL. For instance, Qt and GLib have different implementations depending on the platform they are compiled for. The HAL is in addition a good way to make sure the application is easy to port to new hardware. If the application consists of several interacting processes, it is usually advisable to test each one in isolation. Using D-Bus for the IPC simplifies this, since you can replace the bus with a program that gives predefined reactions.
Running the application on the development host has several advantages. First of all, you have a much larget set of debugging tools available on the development, including debugger, IDE, valgrind, SystemTap, and unlimited tracing. Second, it is often much faster than either simulation or running it on the target platform. Finally, also deployment is a lot quicker: you can compile and immediately run the application, without requiring a reboot or writing anything to flash.
Whatever the simulation approach, it also has to be made reproducible. That typically means that inputs are taken from a file instead of the normal channels (network, A/D, sensors, FPGA, …). Also outputs go to a file instead of to the normal channels, to allow off-line analysis. Creating reproducible inputs is even useful on the target platform itself, where you can debug the full system including timing.
Embedded systems show a lot of time-dependent behavior. Part of this is hidden in the HAL (e.g. timeouts of devices), but often also the application itself has time as one of its inputs. For example, a video display unit has to synchronise several streams for simultaneous display, or a DSP algorithm has to degrade gracefully when the processor is overloaded. Also race conditions in a multi-thread program depend on the timing. This time-dependent behavior is hard to make reproducible, especially when using simulation.
On the target platform, the time-dependent behavior can be approximated fairly well. The only requirement is that the simulation of inputs (see above) also includes information about the time at which this input is available. The thread that parses the input adds delays to match the timestamps in the input file. If the input timestamp has already passed, this is equivalent to a buffer overflow in e.g. DMA and is probably an error. Clearly, the HAL should be carefully thought out to make this scheme possible, e.g. sizing buffers so they match the size of DMA buffers.
One possibility for making timing reproducible in simulation is to simulate time as well. The simulator keeps track of the simulated time of each thread. Every thread (including the input thread) adds delays to the simulated time; the delays should correspond (more or less) to the amount of processing time it would take on the target platform. Whenever a thread communicates with another thread or with the HAL, a synchronisation point is added: the thread blocks until the simulated time of all other threads has reached its own simulated time. This concept was invented by Johan Cockx at Imec.
Unlike PCs, embedded systems are very easy to “brick”. If something goes wrong while updating the firmware, it is very difficult to recover from that, because it’s not possible to boot from e.g. USB or CD-ROM. Often, the device isn’t even easily reachable: the controller of a firetower buoy in the middle of the ocean just has a network connection; if something goes wrong with an upgrade, somebody has to travel in a boat for two days to recover it – assuming they can find it in the first place.
Therefore, for embedded systems it is essential that the update system works and never fails. It is mainly the responsibility of the validation team to test if it works, but the developer has a much better insight in where it can go wrong. The developer should take into account the following in the update mechanism.
- Power failure in the middle of the update, which corrupts the root filesystem or kernel. To protect against this, the updated software should be installed in parallel with the existing software. Links should be updated only after successful installation, and this should be done atomically (i.e. using rename(2), not editing a file in-place). Package managers usually take care of this pretty well. Of course, a journalled filesystem is needed as well to avoid corruption of the filesystem itself.
- Integrity of the software, which may be jeopardized by e.g. data loss over a serial connection or premature termination of a network connection. Package managers protect against this with a hash and signature.
- Installation of incompatible pieces of firmware. Again, package managers help to protect against this.
- Installation of firmware that is not compatible with the hardware. This is most pressing for the kernel and boot loader, but also other pieces of software may have a strong dependency on the hardware. A package manager can help by creating a platform_name virtual package and depending on it.
Clearly, package managers help a lot to secure the update system. However, they can’t be used on read-only filesystems (e.g. squashfs). Other solutions need to be found in that case.
Open source tools for embedded software testing
The testing situation for open source software is unfortunately not very good. Although there is a lot of software to support integration and validation, there is not so much to help the developer. Also, open source software often doesn’t have a good test framework.
Unit test frameworks abound. Wikipedia lists 33 different ones, opensourcetesting.org has 36. However, a unit test framework helps the developer only a little. The developer doesn’t need a summary of failed tests, coverage analysis or complex reporting. Fixtures are interesting, but for the developer it usually easier to code them directly. The developer doesn’t want to run a complete test, only the things which he modified: here a unit test framework can help, if it allows selection of specific tests. Still, it is then necessary to code the tests carefully so each test covers one specific feature. Usually this is not the case. And anyway, usually the unit tests run pretty fast so it isn’t necessary to select a specific one. That said, it is still a good idea if the developers use a unit test framework. It reduces the threshold for other developers to create tests for new modules, because they can look at existing tests. And the validation team will be happy to reuse the tests that developers have already written.
QtTestLib is a unit test framework that is worth mentioning. It has the usual support for fixtures and for data sets. What makes it stand out is that it also has (basic) support for UI testing. It therefore adds something significant for the developer of an application with a (Qt) graphical user interface: there is no need anymore to make mocks of the UI, the test framework can send outputs instead (and all Qt GUI objects allow querying of their current state so there’s no need to watch what is displayed).
What developers really need is the ability to insert testing into the code itself. Assertions and tracing is a good start, and most basic libraries (Qt, GLib) provide these. A great tool are the documentation tests in Python: it allows you to put the test itself right next to the code. AFAIK, however, no similar tools exist outside of Python.
A surprisingly useful test tool is D-Bus. If you use D-Bus as the inter-process communication mechanism of your system, you can isolate each component. Write a script (in one of the many languages for which D-Bus has bindings) that registers the interfaces that your interface communicates with, and make a mock implementation of all methods. The test script can call methods and send signals, and check that the correct replies are received and that whatever else that should happen does happen. You can also insert D-Bus signals in the code under test, which allows the test script to verify the internals of the implementation.
Testing open source code
The main reason why we build embedded systems using Linux, is to use the huge range of libraries and software that is available. We often need to modify these (and contribute back of course). However, most open source software lacks a useful set of developer tests. This makes it more difficult for contributors to make modifications. The contributor obviously does some ad-hoc testing to ascertain that the needed functionality works, but this doesn’t guarantee that it’ll also work in a different environment than the contributor’s. The contributor is often not intimately familiar with the project in the first place: they discover that something is lacking and just implement it. The result are buggy patches which don’t get accepted upstream. Finally, if there are no clear tests, the contributor is also not very motivated to change the test to verify the added features.
GStreamer is an example of an open source framework that could improve its testing. GStreamer requires a test to exist before a new element is accepted in gst-plugins-good. However, in practice, there is typically just one test per element, that just checks the basic functionality. When a new feature is added to an element, it is seldom added to the test. Running the complete test suite takes a long time, and even running the tests for one element (which is not very easy) can take a long time (e.g. for the souphttpsrc element). Creating a good test is not easy, either: setting up a basic pipeline is easy, but inserting special situations (e.g. timing issues, event handling) is not trivial.
The following could be done to improve the situation for GStreamer.
- Create test infrastructure that simplifies testing of functionality that all elements should provide: caps (re)negotiation for the different caps it provides, handling of QoS and other events, handling of state changes, etc.
- Put the tests closer to the source code, so that contributors see they exist and remember to update them.
- Where necessary, split tests so an individual test runs fast. And make it easier to run an individual test.
Note that the above does not mention adding new tests. This is just to make sure testing will be better for future changes. Adding new tests to existing code is too difficult to be effective.
The Linux kernel, on the other hand, does provide some interesting support for testing. First, for almost all types of devices there also exists some kind of dummy driver. For USB gadget, for instance, there is a dummy_hcd driver (to emulate USB gadget hardware and test USB gadget functionality), a zero driver (to emulate USB gadget functionality and test a USB gadget hardware driver). For MTD devices there is a block2mtd driver that emulates an MTD device on top of a block device (that can itself be a loopback device). A second interesting test feature of the kernel is the Linux Test Project (LTP), particularly for filesystem and networking. The entire suite is certainly too large to be used as a developer test. However, it is rather simple to select a specific test (although it may be hard to find which one is relevant). More importantly, the existing tests have good boilerplate and are a good source of inspiration to add tests.
It cannot be stressed enough that the software developer should already start testing. He will naturally test what he writes, and that should be made available to other developers as well. This is even more true for open source software, because there are more contributors and less validation.