Live Atomic update:
- Live == Online == Update while it is running.
- Atomic == all or nothing; if the update fails, you go back to the old version.
Atomic update is traditionally done by restarting the whole system, by either A-B partitioning or by using a rescue partition. Richard works at Baserock, where the atomic update is done by rebooting in a new btfs subvolume after extracting the new bits in a clones subvolume.
Why atomic? It helps support because they see exactly what you’re running, and you don’t risk rendering the system unusable if the update is interrupted – per-file atomic write is not sufficient. So you need an atomic filesystem update.
In an atomic filesystem update, you first create the new version of the filesystem. This is done with a combination of btfs subvolumes and bind-mounting. Then the mount-tree is reproduced, then a pivot_root is applied to get the new rootfs. But of course the old processes are still pointing to the old versions on the old rootfs.
Richard first idea was to use ptrace to chroot the process and to reopen all the fds. Problems: not all processes can be ptraces, not all processes are allowed to execute chroot, and some processes (e.g. journald) cache inode numbers which will change after reopening.
Alternative for pivot_root is to use renameat, but that doesn’t solve migrating the processes.
Alaternative approach: fake atomicity by make a filesystem transaction that would be rolled back on failure. That turns out not to work because the API isn’t powerful enough.
Yet another alternative: use freeze so you can restore to the old context on (power) failure.
Alternative approach 3: let init propagate the migration down, so keep the old one around until all processes have terminated.
Alt. approach 4: use a layer inbetween (e.g. augs) so you just add the new layer on top.
Actually, it turns out that most services will have to be restarted anyway (perhaps gracefully without dropping connections, like Apache can do). So it’s mainly for handing over the shells, and there the ptrace approach can be used.