Krzysztof is in a team that mostly works on persistent memory programming. syscall_intercept is a satellite project. Source on https://github.com/pmem/syscal_intercept.
libpmemfile is a fully userspace filesystem (with persistent memory as backend), so not FUSE-based, nothing goes to the kernel. syscall_intercept is part of libpmemfile. It patches all the system calls and replaces them with jumps to a hook function. So it’s like LD_PRELOAD but then for syscalls instead of libc functions.
syscall_intercept patches the code. To be able to do that, it first disassembles the code to find the syscalls using libcapstone, then find their context (not always trivial/possible), and hotpatch the code with a jump. It only patches libc – in most cases that’s the only one doing syscalls, but it’s also possible to patch the entire .text in the binary (except libsyscall_intercept itself and libcapstone). There is a single syscall hook function that checks the syscall number argument to decide what to do.
Capstone is an open-source disassembly framework. It is used to iterate through all instructions and evaluate if it is a syscall. Also the next instruction has to be evaluated to see if it is relocatable, and if it depends on the instruction pointer. The call can’t be replaced with a direct call or jump to a C function due to the argument and stack prologue, so there is a wrapper routine to set that up. For each syscall instance, a wrapper is instantiated that jumps back directly to the original address, which avoids problems with stack etc. Since on x86_64 a syscall is 2 bytes while a long jump is 5 bytes, you need to make space. If the subsequent instructions can be relocated, then they are put in the wrapper instance. Else, they look for a nearby hole of 5 bytes and issue a short jump to it. And there are a few other solutions too.
SYS_clone is a special case because you have two processes with a different stack pointer, simply restoring registers doesn’t work. So there is a complicated workaround. Also problems with rt_sigreturn and ptrace that don’t have a workaround.
syscall_intercept is just an SDK. You need to write a library that is loaded with LD_PRELOAD and that does something useful in the syscall hooks. To avoid loops, any actual syscall made by the library has to use syscall_no_intercept() instead of syscall().
This can be used for example to make a replacement of strace() that doesn’t make any extra syscalls, just logs every syscall. This is one of the examples in the repo.
Problem when running the program under GDB: you don’t want to instrument gdb.
Code is patched only once, so generated code or dynamically loaded code is not hooked. Also handwritten assembly that uses some tricks or non-standard ways of issuing a syscall could be problematic.
Other things you can do with this library: Error injection, a faster strace, userspace device emulation (which is basically the libpmemfile use case). Also, the same approach could be applied to other instructions than syscalls, as long as they are recognisable in assembly.
syscall_intercept is currently x86_64 only. It could be extended with other arches supported by libcapstone, but that would require supporting their syscall interface.
Interesting question from the audience: could the vDSO approach have been used instead of hotpatching? The speaker nor the audience knew an answer to this.