lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <D42AB116-BFA4-4CF0-BF45-41C579A9B4C7@vmware.com>
Date:   Thu, 4 Oct 2018 09:30:15 +0000
From:   Nadav Amit <namit@...are.com>
To:     Ingo Molnar <mingo@...nel.org>
CC:     "hpa@...or.com" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Jan Beulich <JBeulich@...e.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Andy Lutomirski <luto@...nel.org>
Subject: Re: [PATCH v9 04/10] x86: refcount: prevent gcc distortions

at 2:12 AM, Ingo Molnar <mingo@...nel.org> wrote:

> 
> * Nadav Amit <namit@...are.com> wrote:
> 
>> I can run some tests. (@hpa: I thought you asked about the -pipe overhead;
>> perhaps I misunderstood).
> 
> Well, tests are unlikely to show the overhead of extra lines of this
> magnitude, unless done very carefully, yet the added bloat exists and is not even
> mentioned by the changelog, it just says:
> 
>  Subject: [PATCH v9 02/10] Makefile: Prepare for using macros for inline asm
> 
>  Using macros for inline assembly improves both readability and
>  compilation decisions that are distorted by big assembly blocks that use
>  alternative sections. Compile macros.S and use it to assemble all C
>  files. Currently, only x86 will use it.
> 
>> I guess you regard to the preprocessing of the assembler. Note that the C 
>> preprocessing of macros.S obviously happens only once. That’s the reason
>> I assumed it’s not that expensive.
> 
> True - so first we build macros.s, and that gets included in every C file build, right?
Right.

> 
> macros.s is smaller: 275 lines only in the distro test build I tried, which looks
> a lot better than my first 4,200 lines guesstimate.
> 
>> Anyhow, I remember that we discussed at some point doing something like
>> ‘asm(“.include XXX.s”)’ and somebody said it is not good, but I don’t
>> remember why and don’t see any reason it is so. Unless I am missing
>> something, I think it is possible to take each individual header and
>> preprocess the assembly part of into a separate .s file. Then we can put in
>> the C part of the header ‘asm(".include XXX.s”)’.
>> 
>> What do you think?
> 
> Hm, this looks quite complex - macros.s is better I think. Also, 275 straight assembly lines is 
> a lot better than 4,200.

I’m really not into it, and hpa reminded me why it wouldn’t work. For some
reason I thought the order of macros doesn’t matter in asm (I probably
should go to sleep).

> Another, separate question I wanted to ask: how do we ensure that the kernel stays fixed?
> I.e. is there some tooling we can use to actually measure whether there's bad inlining decisions 
> done, to detect all these bad patterns that cause bad GCC code generation?

Good question. First, I’ll indicate that this patch-set does not handle all
the issues. There is still the issue of conditional use of
__builtin_constant_p().

One indication for bad inlining decisions is the inlined functions have
multiple (non-inlined) instances in the binary and are short. I don’t
have an automatic solution, but you can try, for example to run:

nm --print-size ./vmlinux | grep ' t ' | cut -d' ' -f2- | sort | uniq -c | \
	grep -v '^      1' | sort -n -r | head -n 5

There are however many false positives. After these patches, for example, I
get:

     11 000000000000012f t jhash
      7 0000000000000017 t dst_output
      6 0000000000000011 t kzalloc
      5 000000000000002f t acpi_os_allocate_zeroed
      5 0000000000000029 t acpi_os_allocate


jhash() should not have been inlined in my mind, and should have a
non-inlined implementation. dst_output() is used as a function pointer.
kzalloc() and the next two suffer from the __builtin_constant_p() problem I
described in the past.

Regards,
Nadav

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ