linux-kernel - Re: [GIT PULL] Clang feature updates for v5.14-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKwvOdkcKU4K9LWTymmzi_c0wKPTQjWEbNu04WOd6D-EcnWDSg@mail.gmail.com>
Date:   Tue, 29 Jun 2021 14:27:34 -0700
From:   Nick Desaulniers <ndesaulniers@...gle.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Kees Cook <keescook@...omium.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Bill Wendling <morbo@...gle.com>,
        Bill Wendling <wcw@...gle.com>,
        Catalin Marinas <catalin.marinas@....com>,
        clang-built-linux <clang-built-linux@...glegroups.com>,
        Fangrui Song <maskray@...gle.com>,
        Heiko Carstens <hca@...ux.ibm.com>,
        Jarmo Tiitto <jarmo.tiitto@...il.com>,
        Lukas Bulwahn <lukas.bulwahn@...il.com>,
        Mark Rutland <mark.rutland@....com>,
        Masahiro Yamada <masahiroy@...nel.org>,
        Miguel Ojeda <ojeda@...nel.org>,
        Nathan Chancellor <nathan@...nel.org>,
        Peter Oberparleiter <oberpar@...ux.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Sami Tolvanen <samitolvanen@...gle.com>,
        Will Deacon <will@...nel.org>
Subject: Re: [GIT PULL] Clang feature updates for v5.14-rc1

On Tue, Jun 29, 2021 at 2:04 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Tue, Jun 29, 2021 at 1:44 PM Kees Cook <keescook@...omium.org> wrote:
> > >
> > > And it causes the kernel to be bigger and run slower.
> >
> > Right -- that's expected. It's not designed to be the final kernel
> > someone uses. :)
>
> Well, from what I've seen, you actually want to run real loads in
> production environments for PGO to actually be anything but a bogus
> "performance benchmarks only" kind of thing.
>
> Of course, "performance benchmarks only" is very traditional, and
> we've seen that used over and over in the past in this industry. That
> doesn't make it _right_, though.

The current major use case is ensuring that production kernels have
been "trained" with specific workloads in mind.

> And if you actually want to have it usable in production environments,
> you really should strive to run code as closely as possible to a
> production kernel too.

You could do both.  There is a line of research internally using
multiple training rounds ("CSPGO").

> You'd want to run something that you can sample over time, and in
> production, not something that you have to build a special kernels for
> that then gets used for a benchmark run, but can't be kept in
> production because it performs so much worse.
>
> Real proper profiles will tell you what *really* matters - and if you
> don't have enough samples to give you good information, then that
> particular code clearly is not important enough to waste PGO on.
>
> This is not all that dissimilar to using gprof information for
> traditional - manual - optimizations.
>
> Sure, instrumented gprof output is better than nothing, but it is
> *hugely* worse than actual proper sampled profiles that actually show
> what matters for performance (as opposed to what runs a lot - the two
> are not necessarily all that closely correlated, with cache misses
> being a thing).
>
> And I really hate how pretty much all of the PGO support seems to be
> just about this inferior method of getting the data.

Right now we're having trouble with hardware performance counters on
non-intel chips; I don't think we have working LBR equivalents on AMD
until zen3, and our ETM based samples on ARM are hung up on a few last
minute issues requiring new hardware (from multiple different chipset
vendors).

It would be good to have some form profile based optimizations that
aren't architecture or microarchitecture dependent.
-- 
Thanks,
~Nick Desaulniers