[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGG=3QXrOnN3-3r-VGDpmikKRpaK_p5hM_qHGprDDMuFNKuifA@mail.gmail.com>
Date: Fri, 2 Jul 2021 05:46:46 -0700
From: Bill Wendling <morbo@...gle.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Kees Cook <keescook@...omium.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Bill Wendling <wcw@...gle.com>,
Catalin Marinas <catalin.marinas@....com>,
clang-built-linux <clang-built-linux@...glegroups.com>,
Fangrui Song <maskray@...gle.com>,
Heiko Carstens <hca@...ux.ibm.com>,
Jarmo Tiitto <jarmo.tiitto@...il.com>,
Lukas Bulwahn <lukas.bulwahn@...il.com>,
Mark Rutland <mark.rutland@....com>,
Masahiro Yamada <masahiroy@...nel.org>,
Miguel Ojeda <ojeda@...nel.org>,
Nathan Chancellor <nathan@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Peter Oberparleiter <oberpar@...ux.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
Sami Tolvanen <samitolvanen@...gle.com>,
Will Deacon <will@...nel.org>
Subject: Re: [GIT PULL] Clang feature updates for v5.14-rc1
On Tue, Jun 29, 2021 at 2:04 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Tue, Jun 29, 2021 at 1:44 PM Kees Cook <keescook@...omium.org> wrote:
> > >
> > > And it causes the kernel to be bigger and run slower.
> >
> > Right -- that's expected. It's not designed to be the final kernel
> > someone uses. :)
>
> Well, from what I've seen, you actually want to run real loads in
> production environments for PGO to actually be anything but a bogus
> "performance benchmarks only" kind of thing.
>
The reason we use PGO in this way is because we _cannot_ release a
kernel into production that hasn't had PGO applied to it. The
performance of a non-PGO'ed kernel is a non-starter for rollout. We
try our best to replicate this environment for the benchmarks, which
is the only sane way to do this. I can't imagine that we're the only
ones who run up against this chicken-and-egg problem.
For why we don't use sampling, PGO gives a better performance boost
from an instrumented kernel rather than a sampled profile. I'll work
on getting statistics to show this.
-bw
> Of course, "performance benchmarks only" is very traditional, and
> we've seen that used over and over in the past in this industry. That
> doesn't make it _right_, though.
>
> And if you actually want to have it usable in production environments,
> you really should strive to run code as closely as possible to a
> production kernel too.
>
> You'd want to run something that you can sample over time, and in
> production, not something that you have to build a special kernels for
> that then gets used for a benchmark run, but can't be kept in
> production because it performs so much worse.
>
> Real proper profiles will tell you what *really* matters - and if you
> don't have enough samples to give you good information, then that
> particular code clearly is not important enough to waste PGO on.
>
> This is not all that dissimilar to using gprof information for
> traditional - manual - optimizations.
>
> Sure, instrumented gprof output is better than nothing, but it is
> *hugely* worse than actual proper sampled profiles that actually show
> what matters for performance (as opposed to what runs a lot - the two
> are not necessarily all that closely correlated, with cache misses
> being a thing).
>
> And I really hate how pretty much all of the PGO support seems to be
> just about this inferior method of getting the data.
>
> Linus
Powered by blists - more mailing lists