Free electrons makes upstream kernel support for a number of SoC vendors, so it’s in their interest to make sure that upstream kernel changes don’t break their SoCs. Also they need continuous testing for the changes they make. That’s why they invested in a board farm and CI.
KernelCI is responsible for continuous building of kernels and to aggregate and process the test results. So the only thing that still has to be done is to do the actual tests. With kernelCI, regressions can be detected before they ever reach users, before they’re even merged in mainline. Currently more than 200 boards doing more than 2000 boot tests per day.
KernelCI tracks several upstream git repos for new commits, e.g. arm/arm-soc.git, next/linux-next.git. It automatically builds defconfigs for ARM, ARM64, MIPS and x86. The resulting images and the tests to do on the images are sent to the labs that have board farms and that do the testing. Then kernelCI collects the results.
So Free Electrons built a board farm that does the test automation farm. This controls the boards, launches tests on them, and gathers the results. The framework for this is LAVA – Linaro Automated Validation Architecture. It does power on/off of the board. It automates boot testing. It runs tests simultaneously on all boards. You can define a lot more tests than what is required for kernelCI, e.g. bootloader tests. It can be steered by kernelCI or by a different CI infra e.g. Jenkins.
LAVA has one master for the farm that works with N dispatchers. It schedules the tests requested by CI on the dispatchers. The dispatcher controls a set of boards and is physically connected to them (power and serial). Power is controlled with pdud. Serial is redirected to telnet with ser2net. Files to upload are made available through TFTP. A configuration file specifies how to power (hard_reset_command, power_off_cmd) and how to connect to serial (connection_command). LAVAv2 has an API instead of configuration files, but kernelCI doesn’t generate the proper test specifications for LAVAv2. Common parts are put in a device type configuration, e.g. bootloader commands.
LAVA is started through its API, by sending kernel, dtb, initramfs for a certain device type. LAVA will then search for an available dispatcher with an available device and send the job to it. The dispatcher can still do modification, e.g. zImage -> uImage conversion. It then starts recording and does the upload, reset, test cycle. The resulting log is sent back to the user (i.e. kernelCI) over the API.
For power control, 3 options: a PDU that you can put in a rack but very expensive and not directly ready; a network controlled multisocket but no 8-socket version was available in Europe; remotely controlled relays, which is very flexible, small and cheap but more work (you have to cut wires and screw them in). Remotely controlled relays was chosen. It exists for USB and for TCP, TCP more convenient here because USB already filled up very much. For the wiring, they use a single supply that supplies several boards of the same type (5V, 12V). Actually, and ATX supply gives both 5V and 12V, and enough amps to power 8 boards. To secure the system, TVS diodes are added. To make sure it stays turned on, nPS_ON is shorted to GND. Only for full ATX powered boards you need a real supply per board. Here the relay doesn’t control the power lines itself, but it controls the nPS_ON to GND connection.
To connect serial, USB-serial converters are used so lots of (powered) USB hubs are needed. They need to be powered because some boards draw current on their serial port (bad level shifter design). In addition to that there are also Ethernet cables and switches.
The boards are mounted in drawers for easy access, normally 8 boards per drawer (4 for ATX-powered). The drawer in addition has the USB hub, the Ethernet switch, the relay board, the ATX power supply, and a normal multisocket. The boards are kept in place with Velcro straps
There is room for 50 boards, 35 in use now. 30 different boards.
There were some problems with the LAVA master machine, because there are so many connections to it, they had to use a specifically-configured kernel. Some boards require hardware modifications, e.g. need a button to actually boot it after power is supplied, so need to short that button. Also anything can fail: faulty serial cable, a service on the LAVA master that didn’t restart on reboot, … There were even patches to LAVA needed to support some of the boards.
It is convenient to be able to use the same board remotely for manual testing. However, with LAVA, there is no direct access to the boards. So they created Lavabo, the LAVA board overseeer, that allows a user to take control over a board. It has user authentication with ssh keys installed on the server. Clients connect to lavabo-server over ssh. lavabo-server tells LAVA to stop jobs on that board, than reads the LAVA configuration files to set up access to the board and makes that access available to the ssh connection. It’s currently limited to running on the same server as LAVA itself.