linux-kernel - Re: [PATCH -tip v3 09/11] data_race: Avoid nested statement expression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAK8P3a3UYQeXhiufUevz=rwe09WM_vSTCd9W+KvJHJcOeQyWVA@mail.gmail.com>
Date:   Wed, 27 May 2020 01:10:00 +0200
From:   Arnd Bergmann <arnd@...db.de>
To:     Marco Elver <elver@...gle.com>
Cc:     Will Deacon <will@...nel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Alexander Potapenko <glider@...gle.com>,
        Andrey Konovalov <andreyknvl@...gle.com>,
        kasan-dev <kasan-dev@...glegroups.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        clang-built-linux <clang-built-linux@...glegroups.com>,
        Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH -tip v3 09/11] data_race: Avoid nested statement expression

On Tue, May 26, 2020 at 9:00 PM Arnd Bergmann <arnd@...db.de> wrote:
>
> On Tue, May 26, 2020 at 7:33 PM 'Marco Elver' via Clang Built Linux
> <clang-built-linux@...glegroups.com> wrote:
> > On Tue, 26 May 2020, Marco Elver wrote:
> > > On Tue, 26 May 2020 at 14:19, Arnd Bergmann <arnd@...db.de> wrote:
> > > Note that an 'allyesconfig' selects KASAN and not KCSAN by default.
> > > But I think that's not relevant, since KCSAN-specific code was removed
> > > from ONCEs. In general though, it is entirely expected that we have a
> > > bit longer compile times when we have the instrumentation passes
> > > enabled.
> > >
> > > But as you pointed out, that's irrelevant, and the significant
> > > overhead is from parsing and pre-processing. FWIW, we can probably
> > > optimize Clang itself a bit:
> > > https://github.com/ClangBuiltLinux/linux/issues/1032#issuecomment-633712667
> >
> > Found that optimizing __unqual_scalar_typeof makes a noticeable
> > difference. We could use C11's _Generic if the compiler supports it (and
> > all supported versions of Clang certainly do).
> >
> > Could you verify if the below patch improves compile-times for you? E.g.
> > on fs/ocfs2/journal.c I was able to get ~40% compile-time speedup.
>
> Yes, that brings both the preprocessed size and the time to preprocess it
> with clang-11 back to where it is in mainline, and close to the speed with
> gcc-10 for this particular file.
>
> I also cross-checked with gcc-4.9 and gcc-10 and found that they do see
> the same increase in the preprocessor output, but it makes little difference
> for preprocessing performance on gcc.

Just for reference, I've tested this against a patch I made that completely
shortcuts READ_ONCE() on anything but alpha (which needs the
read_barrier_depends()):

--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -224,18 +224,21 @@ void ftrace_likely_update(struct
ftrace_likely_data *f, int val,
  * atomicity or dependency ordering guarantees. Note that this may result
  * in tears!
  */
-#define __READ_ONCE(x) (*(const volatile __unqual_scalar_typeof(x) *)&(x))
+#define __READ_ONCE(x) (*(const volatile typeof(x) *)&(x))

+#ifdef CONFIG_ALPHA /* smp_read_barrier_depends is a NOP otherwise */
 #define __READ_ONCE_SCALAR(x)                                          \
 ({                                                                     \
        __unqual_scalar_typeof(x) __x = __READ_ONCE(x);                 \
        smp_read_barrier_depends();                                     \
-       (typeof(x))__x;                                                 \
+       __x;                                                            \
 })
+#else
+#define __READ_ONCE_SCALAR(x) __READ_ONCE(x)
+#endif

 #define READ_ONCE(x)                                                   \
 ({                                                                     \

In the configuration I posted earlier, this produces noticeably faster
build times
patch, but yours gets most of the way: https://pastebin.com/pCwALmUD

Looking just at the "task-clock" output from 'perf stat make vmlinux -skj30'

                 clang-11          gcc-9
linux-next     6939594.65 msec   4191482.92 msec
Marco's patch  5399261.82 msec   3800409.58 msec
Arnd's patch   5273888.54 msec   3584550.23 msec

        Arnd