Heterogeneous programming: offload work to specialised cores. For Altera (now Intel PSG), this includes FPGAs. The kernel needs to support this.
FPGAs are configured with a bitstream. This can be full configuration, or partial (re)configuration of a specific section. Typical workflow: HDL design, generate bitstream, load on FPGA. FPGA can talk to the system interconnect which can include cache coherency between FPGA and CPUs.
Reasons to use FPGA: high performance, power consumption (less than GPU).
Existing technologies: Linux FPGA manager (still under development) to program and reconfigure FPGA; OpenCL to partition tasks across heterogeneous computing elements; high-level synthesis to generate bitstream from OpenCL code.
In OpenCL, there is an application that is written in C, C++, … and that calls kernels. The kernels are C99 functions that are synthesized to FPGA (or to x86 or ARM).
Terminology: I/O interconnect = what is used to connect FPGA to CPU (PCIe, AXI, …); Accelerator Function (AF) = set of kernels programmed in FPGA or GPU; dynamic insertion/removal = (partial) dynamic reconfiguration of FPGA with new set of AF.
If PCIe is used for interconnect, discovery is possible, but with AXI it is not. Device tree overlays are used in that case to describe the FPGA, or actually the AF programmed on it. AF descriptors are compiled to overlays. A Device/Resource manager searches for an FPGA that matches the attributes required by the AF, loads the function, generates the overlay if necessary.
Framework should be vendor agnostic but allow vendor-specific plugins.