[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF1bQ=SiHi8oCyo5YnXGpQGofM1zAsnBdqSEet1mS-BYNKVU8A@mail.gmail.com>
Date: Mon, 9 Dec 2024 09:30:50 -0800
From: Rong Xu <xur@...gle.com>
To: Will Deacon <will@...nel.org>
Cc: Yabin Cui <yabinc@...gle.com>, Han Shen <shenhan@...gle.com>,
Jonathan Corbet <corbet@....net>, Catalin Marinas <catalin.marinas@....com>,
Masahiro Yamada <masahiroy@...nel.org>, Kees Cook <kees@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>, workflows@...r.kernel.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected
Enabling an AutoFDO build requires users to explicitly set CONFIG_AUTOFDO_CLANG.
The support code is in Commit 315ad8780a129e82 (kbuild: Add AutoFDO
support for Clang build).
The CONFIG_AUTOFDO_CLANG config, even if selected by the user, will
not be enabled
unless ARCH_SUPPORTS_AUTOFDO_CLANG is present.
We are not enabling this for all architectures because AutoFDO's optimized build
relies on Last Branch Records (LBR) which aren't available on all architectures.
-Rong
On Mon, Dec 9, 2024 at 8:20 AM Will Deacon <will@...nel.org> wrote:
>
> On Mon, Nov 18, 2024 at 02:25:40PM -0800, Yabin Cui wrote:
> > Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be
> > selected.
> >
> > On ARM64, ETM traces can be recorded and converted to AutoFDO profiles.
> > Experiments on Android show 4% improvement in cold app startup time
> > and 13% improvement in binder benchmarks.
> >
> > Signed-off-by: Yabin Cui <yabinc@...gle.com>
> > ---
> >
> > Change-Logs in V2:
> >
> > 1. Use "For ARM platforms with ETM trace" in autofdo.rst.
> > 2. Create an issue and a change to use extbinary format in instructions:
> > https://github.com/Linaro/OpenCSD/issues/65
> > https://android-review.googlesource.com/c/platform/system/extras/+/3362107
> >
> > Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++-
> > arch/arm64/Kconfig | 1 +
> > 2 files changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst
> > index 1f0a451e9ccd..a890e84a2fdd 100644
> > --- a/Documentation/dev-tools/autofdo.rst
> > +++ b/Documentation/dev-tools/autofdo.rst
> > @@ -55,7 +55,7 @@ process consists of the following steps:
> > workload to gather execution frequency data. This data is
> > collected using hardware sampling, via perf. AutoFDO is most
> > effective on platforms supporting advanced PMU features like
> > - LBR on Intel machines.
> > + LBR on Intel machines, ETM traces on ARM machines.
> >
> > #. AutoFDO profile generation: Perf output file is converted to
> > the AutoFDO profile via offline tools.
> > @@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel:
> >
> > $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
> >
> > + - For ARM platforms with ETM trace:
> > +
> > + Follow the instructions in the `Linaro OpenCSD document
> > + https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_
> > + to record ETM traces for AutoFDO::
> > +
> > + $ perf record -e cs_etm/@..._etr0/k -a -o <etm_perf_file> -- <loadtest>
> > + $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il
> > +
> > + For ARM platforms running Android, follow the instructions in the
> > + `Android simpleperf document
> > + <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_
> > + to record ETM traces for AutoFDO::
> > +
> > + $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest>
> > +
> > 4) (Optional) Download the raw perf file to the host machine.
> >
> > 5) To generate an AutoFDO profile, two offline tools are available:
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index fd9df6dcc593..c3814df5e391 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -103,6 +103,7 @@ config ARM64
> > select ARCH_SUPPORTS_PER_VMA_LOCK
> > select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
> > select ARCH_SUPPORTS_RT
> > + select ARCH_SUPPORTS_AUTOFDO_CLANG
> > select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> > select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
> > select ARCH_WANT_DEFAULT_BPF_JIT
>
> After this change, both arm64 and x86 select this option unconditionally
> and with no apparent support code being added. So what is actually
> required in order to select ARCH_SUPPORTS_AUTOFDO_CLANG and why isn't
> it just available for all architectures instead?
>
> Will
Powered by blists - more mailing lists