[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <29591D3B-D49B-4D7A-B280-85A2C3F63F9C@vmware.com>
Date: Thu, 4 Oct 2018 10:23:29 +0000
From: Nadav Amit <namit@...are.com>
To: Ingo Molnar <mingo@...nel.org>
CC: "hpa@...or.com" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Jan Beulich <JBeulich@...e.com>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Andy Lutomirski <luto@...nel.org>
Subject: Re: [PATCH v9 04/10] x86: refcount: prevent gcc distortions
at 2:45 AM, Ingo Molnar <mingo@...nel.org> wrote:
>
> * Nadav Amit <namit@...are.com> wrote:
>
>>> Another, separate question I wanted to ask: how do we ensure that the kernel stays fixed?
>>> I.e. is there some tooling we can use to actually measure whether there's bad inlining decisions
>>> done, to detect all these bad patterns that cause bad GCC code generation?
>>
>> Good question. First, I’ll indicate that this patch-set does not handle all
>> the issues. There is still the issue of conditional use of
>> __builtin_constant_p().
>>
>> One indication for bad inlining decisions is the inlined functions have
>> multiple (non-inlined) instances in the binary and are short. I don’t
>> have an automatic solution, but you can try, for example to run:
>>
>> nm --print-size ./vmlinux | grep ' t ' | cut -d' ' -f2- | sort | uniq -c | \
>> grep -v '^ 1' | sort -n -r | head -n 5
>>
>> There are however many false positives. After these patches, for example, I
>> get:
>>
>> 11 000000000000012f t jhash
>> 7 0000000000000017 t dst_output
>> 6 0000000000000011 t kzalloc
>> 5 000000000000002f t acpi_os_allocate_zeroed
>> 5 0000000000000029 t acpi_os_allocate
>>
>>
>> jhash() should not have been inlined in my mind, and should have a
>> non-inlined implementation. dst_output() is used as a function pointer.
>> kzalloc() and the next two suffer from the __builtin_constant_p() problem I
>> described in the past.
>
> Ok, that's useful info.
>
> The histogram suggests that with all your patches applied the kernel is now in a pretty good
> state in terms of inlining decisions, right?
It was just an example that I ran on the kernel I built right now (with a
custom config). Please don’t regard these results as anything indicative.
> Are you using defconfig or a reasonable distro-config for your tests?
I think it is best to take the kernel and run localyesconfig for testing.
Taking Ubuntu 18.04 and doing the same gives the following results:
12 000000000000012f t jhash
7 0000000000000017 t dst_output
7 0000000000000014 t init_once
5 00000000000000d8 t jhash2
5 000000000000004e t put_page
5 000000000000002f t acpi_os_allocate_zeroed
5 0000000000000029 t acpi_os_allocate
5 0000000000000028 t umask_show
5 0000000000000011 t kzalloc
4 0000000000000053 t trace_xhci_dbg_quirks
Not awful, but not great.
It is a bit hard to fix the __builtin_constant_p() problem without having
some negative side-effects.
Reminder: __builtin_constant_p() is evaluated after the inlining decision
are done. You can use __builtin_choose_expr() instead of an “if”s and
instead of ternary operators when evaluating __builtin_constant_p() to solve
the problem. However, this causes the compiler not to know sometimes that a
value is constant because __builtin_choose_expr () evaluation happens too
early. This __builtin_choose_expr() problem is the reason for put_page() and
kzalloc() are not being inlined. Clang, again, does not suffer from this
problem.
Anyhow, it may be a good practice to try to get rid of the rest. For
example, dst_discard() has four instances because it is always given as a
function pointer. So it should not have been inlined.
Regards,
Nadav
Powered by blists - more mailing lists