[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.21.2001221556160.15957@pobox.suse.cz>
Date: Wed, 22 Jan 2020 16:05:27 +0100 (CET)
From: Miroslav Benes <mbenes@...e.cz>
To: Josh Poimboeuf <jpoimboe@...hat.com>
cc: Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
Joe Lawrence <joe.lawrence@...hat.com>,
Jessica Yu <jeyu@...nel.org>, x86@...nel.org,
linux-kernel@...r.kernel.org, mhiramat@...nel.org,
bristot@...hat.com, jbaron@...mai.com,
torvalds@...ux-foundation.org, tglx@...utronix.de,
mingo@...nel.org, namit@...are.com, hpa@...or.com, luto@...nel.org,
ard.biesheuvel@...aro.org, live-patching@...r.kernel.org,
Randy Dunlap <rdunlap@...radead.org>, mjambor@...e.cz
Subject: Re: [PATCH v3 5/6] x86/ftrace: Use text_poke()
On Wed, 22 Jan 2020, Miroslav Benes wrote:
> On Tue, 21 Jan 2020, Josh Poimboeuf wrote:
>
> > On Tue, Jan 21, 2020 at 09:35:28AM +0100, Miroslav Benes wrote:
> > > On Mon, 20 Jan 2020, Josh Poimboeuf wrote:
> > >
> > > > On Mon, Oct 21, 2019 at 10:05:49AM -0500, Josh Poimboeuf wrote:
> > > > > On Wed, Oct 16, 2019 at 09:42:17AM +0200, Peter Zijlstra wrote:
> > > > > > > which are not compatible with livepatching. GCC upstream now has
> > > > > > > -flive-patching option, which disables all those interfering optimizations.
> > > > > >
> > > > > > Which, IIRC, has a significant performance impact and should thus really
> > > > > > not be used...
> > > > > >
> > > > > > If distros ship that crap, I'm going to laugh at them the next time they
> > > > > > want a single digit performance improvement because *important*.
> > > > >
> > > > > I have a crazy plan to try to use objtool to detect function changes at
> > > > > a binary level, which would hopefully allow us to drop this flag.
> > > > >
> > > > > But regardless, I wonder if we enabled this flag prematurely. We still
> > > > > don't have a reasonable way to use it for creating source-based live
> > > > > patches upstream, and it should really be optional for CONFIG_LIVEPATCH,
> > > > > since kpatch-build doesn't need it.
> > > >
> > > > I also just discovered that -flive-patching is responsible for all those
> > > > "unreachable instruction" objtool warnings which Randy has been
> > > > dutifully bugging me about over the last several months. For some
> > > > reason it subtly breaks GCC implicit noreturn detection for local
> > > > functions.
> > >
> > > Ugh, that is unfortunate. Have you reported it?
> >
> > Not yet (but I plan to).
>
> My findings so far...
>
> I bisected through GCC options which -flive-patching disables and
> -fno-ipa-pure-const is the culprit. I got no warnings without the option
> with my config.
>
> Then I found out allmodconfig was ok even with -flive-patching.
> CONFIG_GCOV is the difference. CONFIG_GCOV=y seems to make the warnings go
> away here.
Sorry, that was a red herring. See 867ac9d73709 ("objtool: Fix gcov check
for older versions of GCC").
I started looking at some btrfs reports and then found out those were
already fixed.
https://lore.kernel.org/linux-btrfs/cd4091e4-1c04-a880-f239-00bc053f46a2@infradead.org/
arch/x86/kernel/cpu/mce/core.o: warning: objtool: mce_panic()+0x11b: unreachable instruction
was next...
Broken code (-fno-ipa-pure-const):
...
1186: e8 a5 fe ff ff callq 1030 <wait_for_panic>
118b: e9 23 ff ff ff jmpq 10b3 <mce_panic+0x43>
</end of function>
Working code (-fipa-pure-const):
753: e8 88 fe ff ff callq 5e0 <wait_for_panic>
758: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
75f: 00
mce_panic() has:
if (atomic_inc_return(&mce_panicked) > 1)
wait_for_panic();
barrier();
bust_spinlocks(1);
jmpq in the broken code goes to bust_spinlocks(1), because GCC does not
know that wait_for_panic() is noreturn... because it is not.
wait_for_panic() calls panic() unconditionally in the end, which is
noreturn.
So the question is why ipa-pure-const optimization knows about panic()'s
noreturn. The answer is that it is right one of the things the
optimization does. It propagates inner noreturns to its callers. (Martin
Jambor CCed).
Marking wait_for_panic() as noreturn (__noreturn), of course, fixes it
then. Now I don't know what the right fix should be. Should we mark all
these sites as noreturn, or is it ok for the kernel to rely on GCC
behaviour in this case? Could we teach objtool to recognize this?
Miroslav
Powered by blists - more mailing lists