[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKwvOdkcKU4K9LWTymmzi_c0wKPTQjWEbNu04WOd6D-EcnWDSg@mail.gmail.com>
Date: Tue, 29 Jun 2021 14:27:34 -0700
From: Nick Desaulniers <ndesaulniers@...gle.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Kees Cook <keescook@...omium.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Bill Wendling <morbo@...gle.com>,
Bill Wendling <wcw@...gle.com>,
Catalin Marinas <catalin.marinas@....com>,
clang-built-linux <clang-built-linux@...glegroups.com>,
Fangrui Song <maskray@...gle.com>,
Heiko Carstens <hca@...ux.ibm.com>,
Jarmo Tiitto <jarmo.tiitto@...il.com>,
Lukas Bulwahn <lukas.bulwahn@...il.com>,
Mark Rutland <mark.rutland@....com>,
Masahiro Yamada <masahiroy@...nel.org>,
Miguel Ojeda <ojeda@...nel.org>,
Nathan Chancellor <nathan@...nel.org>,
Peter Oberparleiter <oberpar@...ux.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
Sami Tolvanen <samitolvanen@...gle.com>,
Will Deacon <will@...nel.org>
Subject: Re: [GIT PULL] Clang feature updates for v5.14-rc1
On Tue, Jun 29, 2021 at 2:04 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Tue, Jun 29, 2021 at 1:44 PM Kees Cook <keescook@...omium.org> wrote:
> > >
> > > And it causes the kernel to be bigger and run slower.
> >
> > Right -- that's expected. It's not designed to be the final kernel
> > someone uses. :)
>
> Well, from what I've seen, you actually want to run real loads in
> production environments for PGO to actually be anything but a bogus
> "performance benchmarks only" kind of thing.
>
> Of course, "performance benchmarks only" is very traditional, and
> we've seen that used over and over in the past in this industry. That
> doesn't make it _right_, though.
The current major use case is ensuring that production kernels have
been "trained" with specific workloads in mind.
> And if you actually want to have it usable in production environments,
> you really should strive to run code as closely as possible to a
> production kernel too.
You could do both. There is a line of research internally using
multiple training rounds ("CSPGO").
> You'd want to run something that you can sample over time, and in
> production, not something that you have to build a special kernels for
> that then gets used for a benchmark run, but can't be kept in
> production because it performs so much worse.
>
> Real proper profiles will tell you what *really* matters - and if you
> don't have enough samples to give you good information, then that
> particular code clearly is not important enough to waste PGO on.
>
> This is not all that dissimilar to using gprof information for
> traditional - manual - optimizations.
>
> Sure, instrumented gprof output is better than nothing, but it is
> *hugely* worse than actual proper sampled profiles that actually show
> what matters for performance (as opposed to what runs a lot - the two
> are not necessarily all that closely correlated, with cache misses
> being a thing).
>
> And I really hate how pretty much all of the PGO support seems to be
> just about this inferior method of getting the data.
Right now we're having trouble with hardware performance counters on
non-intel chips; I don't think we have working LBR equivalents on AMD
until zen3, and our ETM based samples on ARM are hung up on a few last
minute issues requiring new hardware (from multiple different chipset
vendors).
It would be good to have some form profile based optimizations that
aren't architecture or microarchitecture dependent.
--
Thanks,
~Nick Desaulniers
Powered by blists - more mailing lists