[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF1bQ=Qi9hyKbc5H3N36W=MukT3321rZMCas0ndpRf0YszAfOA@mail.gmail.com>
Date: Mon, 21 Oct 2024 17:00:01 -0700
From: Rong Xu <xur@...gle.com>
To: Masahiro Yamada <masahiroy@...nel.org>
Cc: Alice Ryhl <aliceryhl@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>,
Arnd Bergmann <arnd@...db.de>, Bill Wendling <morbo@...gle.com>, Borislav Petkov <bp@...en8.de>,
Breno Leitao <leitao@...ian.org>, Brian Gerst <brgerst@...il.com>,
Dave Hansen <dave.hansen@...ux.intel.com>, David Li <davidxl@...gle.com>,
Han Shen <shenhan@...gle.com>, Heiko Carstens <hca@...ux.ibm.com>, "H. Peter Anvin" <hpa@...or.com>,
Ingo Molnar <mingo@...hat.com>, Jann Horn <jannh@...gle.com>, Jonathan Corbet <corbet@....net>,
Josh Poimboeuf <jpoimboe@...nel.org>, Juergen Gross <jgross@...e.com>,
Justin Stitt <justinstitt@...gle.com>, Kees Cook <kees@...nel.org>,
"Mike Rapoport (IBM)" <rppt@...nel.org>, Nathan Chancellor <nathan@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>, Nicolas Schier <nicolas@...sle.eu>,
"Paul E. McKenney" <paulmck@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
Sami Tolvanen <samitolvanen@...gle.com>, Thomas Gleixner <tglx@...utronix.de>,
Wei Yang <richard.weiyang@...il.com>, workflows@...r.kernel.org,
Miguel Ojeda <miguel.ojeda.sandonis@...il.com>, Maksim Panchenko <max4bolt@...il.com>, x86@...nel.org,
linux-arch@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kbuild@...r.kernel.org, linux-kernel@...r.kernel.org,
llvm@...ts.linux.dev, Sriraman Tallam <tmsriram@...gle.com>,
Krzysztof Pszeniczny <kpszeniczny@...gle.com>, Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCH v4 6/6] Add Propeller configuration for kernel build.
On Sun, Oct 20, 2024 at 10:49 AM Masahiro Yamada <masahiroy@...nel.org> wrote:
>
> Please remove the period at the end of the commit subject.
Will fix this.
>
>
>
> On Tue, Oct 15, 2024 at 6:34 AM Rong Xu <xur@...gle.com> wrote:
> >
> > Add the build support for using Clang's Propeller optimizer. Like
> > AutoFDO, Propeller uses hardware sampling to gather information
> > about the frequency of execution of different code paths within a
> > binary. This information is then used to guide the compiler's
> > optimization decisions, resulting in a more efficient binary.
> >
> > The support requires a Clang compiler LLVM 19 or later, and the
> > create_llvm_prof tool
> > (https://github.com/google/autofdo/releases/tag/v0.30.1). This
> > submission is limited to x86 platforms that support PMU features
>
>
> "This submission" -> "This commit"
Will fix this.
>
>
>
> > like LBR on Intel machines and AMD Zen3 BRS.
> >
> > For Arm, we plan to send patches for SPE-based Propeller when
> > AutoFDO for Arm is ready.
>
>
> "we plan to send ..." is not a good description once it is committed.
>
> This sentence should be moved to the cover letter, or reworked.
We will move this sentence to the cover letter.
>
>
>
>
>
>
> >
> > Here is an example workflow for building an AutoFDO+Propeller
> > optimized kernel:
> >
> > 1) Build the kernel on the HOST machine, with AutoFDO and Propeller
>
>
> Why is the "HOST" capitalized?
We will fix this.
>
>
>
> > build config
> > CONFIG_AUTOFDO_CLANG=y
> > CONFIG_PROPELLER_CLANG=y
> > then
> > $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile>
> >
> > “<autofdo_profile>” is the profile collected when doing a non-Propeller
> > AutoFDO build. This step builds a kernel that has the same optimization
> > level as AutoFDO, plus a metadata section that records basic block
> > information. This kernel image runs as fast as an AutoFDO optimized
> > kernel.
> >
> > 2) Install the kernel on test/production machines.
> >
> > 3) Run the load tests. The '-c' option in perf specifies the sample
> > event period. We suggest using a suitable prime number,
> > like 500009, for this purpose.
> > For Intel platforms:
> > $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \
> > -o <perf_file> -- <loadtest>
> > For AMD platforms:
> > The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2
> > # To see if Zen3 support LBR:
> > $ cat proc/cpuinfo | grep " brs"
> > # To see if Zen4 support LBR:
> > $ cat proc/cpuinfo | grep amd_lbr_v2
> > # If the result is yes, then collect the profile using:
> > $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \
> > -N -b -c <count> -o <perf_file> -- <loadtest>
> >
> > 4) (Optional) Download the raw perf file to the HOST machine.
>
>
> Same question as above.
Will use "host".
>
>
> >
> > 5) Generate Propeller profile:
> > $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \
> > --format=propeller --propeller_output_module_name \
> > --out=<propeller_profile_prefix>_cc_profile.txt \
> > --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
> >
> > “create_llvm_prof” is the profile conversion tool, and a prebuilt
> > binary for linux can be found on
> > https://github.com/google/autofdo/releases/tag/v0.30.1 (can also build
> > from source).
> >
> > "<propeller_profile_prefix>" can be something like
> > "/home/user/dir/any_string".
> >
> > This command generates a pair of Propeller profiles:
> > "<propeller_profile_prefix>_cc_profile.txt" and
> > "<propeller_profile_prefix>_ld_profile.txt".
> >
> > 6) Rebuild the kernel using the AutoFDO and Propeller profile files.
> > CONFIG_AUTOFDO_CLANG=y
> > CONFIG_PROPELLER_CLANG=y
> > and
> > $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> \
> > CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix>
> >
> > Co-developed-by: Han Shen <shenhan@...gle.com>
> > Signed-off-by: Han Shen <shenhan@...gle.com>
> > Signed-off-by: Rong Xu <xur@...gle.com>
> > Suggested-by: Sriraman Tallam <tmsriram@...gle.com>
> > Suggested-by: Krzysztof Pszeniczny <kpszeniczny@...gle.com>
> > Suggested-by: Nick Desaulniers <ndesaulniers@...gle.com>
> > Suggested-by: Stephane Eranian <eranian@...gle.com>
>
>
>
> >
> > .. only:: subproject and html
> > diff --git a/Documentation/dev-tools/propeller.rst b/Documentation/dev-tools/propeller.rst
> > new file mode 100644
> > index 000000000000..a217354e0f95
> > --- /dev/null
> > +++ b/Documentation/dev-tools/propeller.rst
> > @@ -0,0 +1,161 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +=====================================
> > +Using Propeller with the Linux kernel
> > +=====================================
> > +
> > +This enables Propeller build support for the kernel when using Clang
> > +compiler. Propeller is a profile-guided optimization (PGO) method used
> > +to optimize binary executables. Like AutoFDO, it utilizes hardware
> > +sampling to gather information about the frequency of execution of
> > +different code paths within a binary. Unlike AutoFDO, this information
> > +is then used right before linking phase to optimize (among others)
> > +block layout within and across functions.
> > +
> > +A few important notes about adopting Propeller optimization:
> > +
> > +#. Although it can be used as a standalone optimization step, it is
> > + strongly recommended to apply Propeller on top of AutoFDO,
> > + AutoFDO+ThinLTO or Instrument FDO. The rest of this document
> > + assumes this paradigm.
>
> This is a hard requirement instead of a recommendation
> because PROPERLLER_CLANG has "depends on AUTOFDO_CLANG".
Actually PROPELLER_CLANG does not depend on AUTOFDO_CLANG.
We should apply Propeller on top of the vanilla build kernel.
I admit that we did not do a good job to separate these two in this patch.
>
>
>
>
> > +
> > +#. Propeller uses another round of profiling on top of
> > + AutoFDO/AutoFDO+ThinLTO/iFDO. The whole build process involves
> > + "build-afdo - train-afdo - build-propeller - train-propeller -
> > + build-optimized".
> > +
> > +#. Propeller requires LLVM 19 release or later for Clang/Clang++
> > + and the linker(ld.lld).
> > +
> > +#. In addition to LLVM toolchain, Propeller requires a profiling
> > + conversion tool: https://github.com/google/autofdo with a release
> > + after v0.30.1: https://github.com/google/autofdo/releases/tag/v0.30.1.
> > +
> > +The Propeller optimization process involves the following steps:
> > +
> > +#. Initial building: Build the AutoFDO or AutoFDO+ThinLTO binary as
> > + you would normally do, but with a set of compile-time / link-time
> > + flags, so that a special metadata section is created within the
> > + kernel binary. The special section is only intend to be used by the
> > + profiling tool, it is not part of the runtime image, nor does it
> > + change kernel run time text sections.
> > +
> > +#. Profiling: The above kernel is then run with a representative
> > + workload to gather execution frequency data. This data is collected
> > + using hardware sampling, via perf. Propeller is most effective on
> > + platforms supporting advanced PMU features like LBR on Intel
> > + machines. This step is the same as profiling the kernel for AutoFDO
> > + (the exact perf parameters can be different).
> > +
> > +#. Propeller profile generation: Perf output file is converted to a
> > + pair of Propeller profiles via an offline tool.
> > +
> > +#. Optimized build: Build the AutoFDO or AutoFDO+ThinLTO optimized
> > + binary as you would normally do, but with a compile-time /
> > + link-time flag to pick up the Propeller compile time and link time
> > + profiles. This build step uses 3 profiles - the AutoFDO profile,
> > + the Propeller compile-time profile and the Propeller link-time
> > + profile.
> > +
> > +#. Deployment: The optimized kernel binary is deployed and used
> > + in production environments, providing improved performance
> > + and reduced latency.
> > +
> > +Preparation
> > +===========
> > +
> > +Configure the kernel with::
> > +
> > + CONFIG_AUTOFDO_CLANG=y
>
>
> This is automatically met due to "depends on AUTOFDO_CLANG".
Agreed. But we will remove the dependency from PROPELlER_CLANG to AUTOFDO_CLANG.
So we will keep the part.
>
>
>
> > + CONFIG_PROPELLER_CLANG=y
> > +
> > +Customization
> > +=============
> > +
> > +You can enable or disable Propeller build for individual file and
> > +directories by adding a line similar to the following to the
> > +respective kernel Makefile:
>
> The same comment as in 1/6.
We will fix this similar to the proposed change in 1/6 if you think
the change there is acceptable.
>
>
>
> > +- For enabling a single file (e.g. foo.o)::
> > +
> > + PROPELLER_PROFILE_foo.o := y
> > +
> > +- For enabling all files in one directory::
> > +
> > + PROPELLER_PROFILE := y
> > +
> > +- For disabling one file::
> > +
> > + PROPELLER_PROFILE_foo.o := n
> > +
> > +- For disabling all files in one directory::
> > +
> > + PROPELLER__PROFILE := n
> > +
> > +
> > +Workflow
> > +========
> > +
> > +Here is an example workflow for building an AutoFDO+Propeller kernel:
> > +
> > +1) Assuming an AutoFDO profile is already collected following
> > + instructions in the AutoFDO document, build the kernel on the HOST
> > + machine, with AutoFDO and Propeller build configs ::
> > +
> > + CONFIG_AUTOFDO_CLANG=y
> > + CONFIG_PROPELLER_CLANG=y
> > +
> > + and ::
> > +
> > + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo-profile-name>
> > +
> > +2) Install the kernel on the TEST machine.
>
>
> I am repeatedly encountered with capitalized "HOST" and "TEST".
>
> Does this term have a special meaning instead of a test machine in general?
No special meaning. This is not intentional. Will fix this.
>
>
>
>
>
>
>
> > +
> > +3) Run the load tests. The '-c' option in perf specifies the sample
> > + event period. We suggest using a suitable prime number, like 500009,
> > + for this purpose.
> > +
> > + - For Intel platforms::
> > +
> > + $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
> > +
> > + - For AMD platforms::
> > +
> > + $ perf record --pfm-event RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
> > +
> > + Note you can repeat the above steps to collect multiple <perf_file>s.
> > +
> > +4) (Optional) Download the raw perf file(s) to the HOST machine.
> > +
> > +5) Use the create_llvm_prof tool (https://github.com/google/autofdo) to
> > + generate Propeller profile. ::
> > +
> > + $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file>
> > + --format=propeller --propeller_output_module_name
> > + --out=<propeller_profile_prefix>_cc_profile.txt
> > + --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
> > +
> > + "<propeller_profile_prefix>" can be something like "/home/user/dir/any_string".
> > +
> > + This command generates a pair of Propeller profiles:
> > + "<propeller_profile_prefix>_cc_profile.txt" and
> > + "<propeller_profile_prefix>_ld_profile.txt".
> > +
> > + If there are more than 1 perf_file collected in the previous step,
> > + you can create a temp list file "<perf_file_list>" with each line
> > + containing one perf file name and run::
> > +
> > + $ create_llvm_prof --binary=<vmlinux> --profile=@<perf_file_list>
> > + --format=propeller --propeller_output_module_name
> > + --out=<propeller_profile_prefix>_cc_profile.txt
> > + --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
> > +
> > +6) Rebuild the kernel using the AutoFDO and Propeller
> > + profiles. ::
>
>
> "." and "::" are an odd combination.
"::" is an rst marker. I will make sure the rendered text looks good.
>
>
>
>
> > +
> > + CONFIG_AUTOFDO_CLANG=y
> > + CONFIG_PROPELLER_CLANG=y
> > +
> > + and ::
> > +
> > + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix>
>
>
>
> > diff --git a/Makefile b/Makefile
> > index bbb6ec68f5dc..2d2f688c21c6 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -1019,6 +1019,7 @@ include-$(CONFIG_UBSAN) += scripts/Makefile.ubsan
> > include-$(CONFIG_KCOV) += scripts/Makefile.kcov
> > include-$(CONFIG_RANDSTRUCT) += scripts/Makefile.randstruct
> > include-$(CONFIG_AUTOFDO_CLANG) += scripts/Makefile.autofdo
> > +include-$(CONFIG_PROPELLER_CLANG) += scripts/Makefile.propeller
> > include-$(CONFIG_GCC_PLUGINS) += scripts/Makefile.gcc-plugins
> >
> > include $(addprefix $(srctree)/, $(include-y))
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 5e9604960cbb..fdeb5f173a10 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -831,6 +831,28 @@ config AUTOFDO_CLANG
> >
> > If unsure, say N.
> >
> > +config ARCH_SUPPORTS_PROPELLER_CLANG
> > + bool
> > +
> > +config PROPELLER_CLANG
> > + bool "Enable Clang's Propeller build"
> > + depends on ARCH_SUPPORTS_PROPELLER_CLANG
> > + depends on AUTOFDO_CLANG
> > + depends on CC_IS_CLANG && CLANG_VERSION >= 190000
>
>
> CC_IS_CLANG is redundant, but I am fine if you want to have it explicitly.
Let's keep this just for clarity purposes.
>
>
>
> > + help
> > + This option enables Clang’s Propeller build which
> > + is on top of AutoFDO build. When the Propeller profiles
> > + is specified in variable CLANG_PROPELLER_PROFILE_PREFIX
> > + during the build process, Clang uses the profiles to
> > + optimize the kernel.
> > +
> > + If no profile is specified, Proepller options are
>
>
> "Proepller" is a typo.
Thanks! Will fix this.
>
>
>
>
> > + still passed to Clang to facilitate the collection
> > + of perf data for creating the Propeller profiles in
> > + subsequent builds.
> > +
> > + If unsure, say N.
> > +
> > config ARCH_SUPPORTS_CFI_CLANG
> > bool
> > help
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 503a0268155a..da47164bfddc 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -127,6 +127,7 @@ config X86
> > select ARCH_SUPPORTS_LTO_CLANG_THIN
> > select ARCH_SUPPORTS_RT
> > select ARCH_SUPPORTS_AUTOFDO_CLANG
> > + select ARCH_SUPPORTS_PROPELLER_CLANG if X86_64
> > select ARCH_USE_BUILTIN_BSWAP
> > select ARCH_USE_CMPXCHG_LOCKREF if X86_CMPXCHG64
> > select ARCH_USE_MEMTEST
> > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> > index 6726be89b7a6..7ecc21c569be 100644
> > --- a/arch/x86/kernel/vmlinux.lds.S
> > +++ b/arch/x86/kernel/vmlinux.lds.S
> > @@ -442,6 +442,10 @@ SECTIONS
> >
> > STABS_DEBUG
> > DWARF_DEBUG
> > +#ifdef CONFIG_PROPELLER_CLANG
> > + .llvm_bb_addr_map : { *(.llvm_bb_addr_map) }
> > +#endif
> > +
> > ELF_DETAILS
> >
> > DISCARDS
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index 20e46c0917db..5986dd4cfb14 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -95,14 +95,14 @@
> > * With LTO_CLANG, the linker also splits sections by default, so we need
> > * these macros to combine the sections during the final link.
> > *
> > - * With LTO_CLANG, the linker also splits sections by default, so we need
> > - * these macros to combine the sections during the final link.
> > + * CONFIG_AUTOFD_CLANG and CONFIG_PROPELLER_CLANG will also split text sections
> > + * and cluster them in the linking time.
> > *
> > * RODATA_MAIN is not used because existing code already defines .rodata.x
> > * sections to be brought in with rodata.
> > */
> > #if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG) || \
> > -defined(CONFIG_AUTOFDO_CLANG)
> > +defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
>
>
> If you have "depends on PROPELLER_CLANG" in Kconfig,
> you do not need to touch this line.
>
> When CONFIG_PROPELLER_CLANG is enabled, CONFIG_AUTOFDO_CLANG is already defined.
We will remove the dependency from CONFIG_PROPELLER_CLANG to
CONFIG_AUTOFDO_CLANG.
So I guess we will keep this part.
>
>
>
>
> > #define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
> > #else
> > #define TEXT_MAIN .text
> > @@ -556,7 +556,7 @@ defined(CONFIG_AUTOFDO_CLANG)
> > __cpuidle_text_end = .; \
> > __noinstr_text_end = .;
> >
> > -#ifdef CONFIG_AUTOFDO_CLANG
> > +#if defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
>
>
> Ditto.
>
>
> > #define TEXT_HOT \
> > __hot_text_start = .; \
> > *(.text.hot .text.hot.*) \
> > @@ -584,7 +584,7 @@ defined(CONFIG_AUTOFDO_CLANG)
> > * first when in these builds.
> > */
> > #if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG) || \
> > -defined(CONFIG_AUTOFDO_CLANG)
> > +defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
>
>
> Ditto.
> Make sense only when CONFIG_AUTOFDO_CLANG and CONFIG_PROPELLER_CLANG
> are independent of each other.
We will make CONFIG_AUTOFDO_CLANG and CONFIG_PROPELLER_CLANG
independent of each other.
>
>
>
> > #define TEXT_TEXT \
> > ALIGN_FUNCTION(); \
> > *(.text.asan.* .text.tsan.*) \
> > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> > index e85d6ac31bd9..60354c476956 100644
> > --- a/scripts/Makefile.lib
> > +++ b/scripts/Makefile.lib
> > @@ -201,6 +201,16 @@ _c_flags += $(if $(patsubst n%,, \
> > $(CFLAGS_AUTOFDO_CLANG))
> > endif
> >
> > +#
> > +# Enable Clang's Propeller build flags for a file or directory depending on
> > +# variables AUTOFDO_PROPELLER_obj.o and PROPELLER_PROFILE.
>
> The same comment as in 1/6.
Will fix this.
>
>
>
> > +#
> > +ifeq ($(CONFIG_PROPELLER_CLANG),y)
>
>
>
> ifdef CONFIG_PROPELLER_CLANG
>
> would be simpler, as you used this style in scripts/Makefile.propeller
Will use the suggested code.
>
>
>
>
>
>
> > +_c_flags += $(if $(patsubst n%,, \
> > + $(AUTOFDO_PROFILE_$(target-stem).o)$(AUTOFDO_PROFILE)$(PROPELLER_PROFILE))$(is-kernel-object), \
> > + $(CFLAGS_PROPELLER_CLANG))
> > +endif
> > +
> > # $(src) for including checkin headers from generated source files
> > # $(obj) for including generated headers from checkin source files
> > ifeq ($(KBUILD_EXTMOD),)
> > diff --git a/scripts/Makefile.propeller b/scripts/Makefile.propeller
> > new file mode 100644
> > index 000000000000..344190717e47
> > --- /dev/null
> > +++ b/scripts/Makefile.propeller
>
>
> > +# Propeller requires debug information to embed module names in the profiles.
> > +# If CONFIG_DEBUG_INFO is not enabled, set -gmlt option. Skip this for AutoFDO,
> > +# as the option should already be set.
> > +ifndef CONFIG_DEBUG_INFO
> > + ifndef CONFIG_AUTOFDO_CLANG
> > + CFLAGS_PROPELLER_CLANG += -gmlt
> > + endif
> > +endif
>
>
> This block is dead code due to "depends on AUTOFDO_CLANG".
>
> "ifndef CONFIG_AUTOFDO_CLANG" is never met here.
Yes. I think we still need to when we remove the dependency to
CONFIG_AUTOFDO_CLANG.
>
>
>
>
>
>
>
> --
> Best Regards
> Masahiro Yamada
Powered by blists - more mailing lists