[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <29591D3B-D49B-4D7A-B280-85A2C3F63F9C@vmware.com>
Date:   Thu, 4 Oct 2018 10:23:29 +0000
From:   Nadav Amit <namit@...are.com>
To:     Ingo Molnar <mingo@...nel.org>
CC:     "hpa@...or.com" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Jan Beulich <JBeulich@...e.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Andy Lutomirski <luto@...nel.org>
Subject: Re: [PATCH v9 04/10] x86: refcount: prevent gcc distortions
at 2:45 AM, Ingo Molnar <mingo@...nel.org> wrote:
> 
> * Nadav Amit <namit@...are.com> wrote:
> 
>>> Another, separate question I wanted to ask: how do we ensure that the kernel stays fixed?
>>> I.e. is there some tooling we can use to actually measure whether there's bad inlining decisions 
>>> done, to detect all these bad patterns that cause bad GCC code generation?
>> 
>> Good question. First, I’ll indicate that this patch-set does not handle all
>> the issues. There is still the issue of conditional use of
>> __builtin_constant_p().
>> 
>> One indication for bad inlining decisions is the inlined functions have
>> multiple (non-inlined) instances in the binary and are short. I don’t
>> have an automatic solution, but you can try, for example to run:
>> 
>> nm --print-size ./vmlinux | grep ' t ' | cut -d' ' -f2- | sort | uniq -c | \
>> 	grep -v '^      1' | sort -n -r | head -n 5
>> 
>> There are however many false positives. After these patches, for example, I
>> get:
>> 
>>     11 000000000000012f t jhash
>>      7 0000000000000017 t dst_output
>>      6 0000000000000011 t kzalloc
>>      5 000000000000002f t acpi_os_allocate_zeroed
>>      5 0000000000000029 t acpi_os_allocate
>> 
>> 
>> jhash() should not have been inlined in my mind, and should have a
>> non-inlined implementation. dst_output() is used as a function pointer.
>> kzalloc() and the next two suffer from the __builtin_constant_p() problem I
>> described in the past.
> 
> Ok, that's useful info.
> 
> The histogram suggests that with all your patches applied the kernel is now in a pretty good 
> state in terms of inlining decisions, right?
It was just an example that I ran on the kernel I built right now (with a
custom config). Please don’t regard these results as anything indicative.
> Are you using defconfig or a reasonable distro-config for your tests?
I think it is best to take the kernel and run localyesconfig for testing.
Taking Ubuntu 18.04 and doing the same gives the following results:
     12 000000000000012f t jhash
      7 0000000000000017 t dst_output
      7 0000000000000014 t init_once
      5 00000000000000d8 t jhash2
      5 000000000000004e t put_page
      5 000000000000002f t acpi_os_allocate_zeroed
      5 0000000000000029 t acpi_os_allocate
      5 0000000000000028 t umask_show
      5 0000000000000011 t kzalloc
      4 0000000000000053 t trace_xhci_dbg_quirks
Not awful, but not great.
It is a bit hard to fix the __builtin_constant_p() problem without having
some negative side-effects.
Reminder: __builtin_constant_p() is evaluated after the inlining decision
are done. You can use __builtin_choose_expr() instead of an “if”s and
instead of ternary operators when evaluating __builtin_constant_p() to solve
the problem. However, this causes the compiler not to know sometimes that a
value is constant because __builtin_choose_expr () evaluation happens too
early. This __builtin_choose_expr() problem is the reason for put_page() and
kzalloc() are not being inlined. Clang, again, does not suffer from this
problem.
Anyhow, it may be a good practice to try to get rid of the rest. For
example, dst_discard() has four instances because it is always given as a
function pointer. So it should not have been inlined.
Regards,
Nadav
Powered by blists - more mailing lists
 
