Using AutoFDO with the Linux kernel

This enables AutoFDO build support for the kernel when using Clang compiler. AutoFDO (Auto-Feedback-Directed Optimization) is a profile-guided optimization (PGO) method used to optimize binary executables. It utilizes hardware sampling to gather information about the frequency of execution of different code paths within a binary. This information is then used to guide the compiler’s optimization decisions, resulting in a more efficient binary.

The AutoFDO optimization process involves the following steps:

  1. Initial build: The kernel is built with AutoFDO options without a profile.

  2. Profiling: The above kernel is then run with a representative workload to gather execution frequency data. This data is collected using hardware sampling, via perf. AutoFDO is most effective on platforms supporting advanced PMU features like LBR on Intel machines.

  3. AutoFDO profile generation: Perf output file is converted to the AutoFDO profile via offline tools.

  4. Optimized build: The Clang compiler uses the AutoFDO profile to guide its optimization decisions during recompilation. The compiler focuses on optimizing the frequently executed code paths, resulting in more efficient code.

  5. Deployment: The optimized kernel binary is deployed and used in production environments, providing improved performance and reduced latency.

In a production environment, Profiling can be directly applied to the deployed kernel, eliminating the requirement for the initial build step.

AutoFDO is known to be a powerful optimization technique and the data show that it can substantially improve the kernel’s performance. It is especially advantageous for workloads that are constrained by front-end stalls.

Preparation

Configure the kernel with:

CONFIG_AUTOFDO_CLANG=y

Customization

You can enable or disable AutoFDO build for individual file and directories by adding a line similar to the following to the respective kernel Makefile:

  • For enabling a single file (e.g. foo.o)

    AUTOFDO_PROFILE_foo.o := y
    
  • For enabling all files in one directory

    AUTOFDO_PROFILE := y
    
  • For disabling one file

    AUTOFDO_PROFILE_foo.o := n
    
  • For disabling all files in one directory

    AUTOFDO_PROFILE := n
    

Workflow

Here is an example workflow for AutoFDO kernel:

  1. Build the kernel on the HOST machine, with AutoFDO build config:

    CONFIG_AUTOFDO_CLANG=y
    

    and

    $ make LLVM=1
    
  2. Install the kernel on the TEST machine.

  3. Run the load tests. The ‘-c’ option in perf specifies the sample event period. We suggest using a suitable prime number, like 500009, for this purpose.

    • For Intel platforms:
      $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
      
    • For AMD platforms:

      $ perf record -e RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
      
  4. (Optional) Download the raw perf file to the HOST machine.

  5. Generate AutoFDO profile. Two offline tools are available for this purpose: create_llvm_prof and llvm_profgen. The create_llvm_prof tool can be found as part of the AutoFDO project (https://github.com/google/autofdo). The llvm_profgen tool is included within the LLVM compiler itself.

    $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> -o <profile_file>
    

    or

    $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --format=extbinary -o <profile_file>
    

    Note that multiple AutoFDO profile files can be merged into one via:

    $ llvm-profdata merge -o <profile_file>  <profile_1> <profile_2> ... <profile_n>
    
  6. Rebuild the kernel using the AutoFDO profile file.

    CONFIG_AUTOFDO_CLANG=y
    

    and

    $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file>