[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1629983260.5jkx2jf0y8.astroid@bobo.none>
Date: Thu, 26 Aug 2021 23:57:52 +1000
From: Nicholas Piggin <npiggin@...il.com>
To: Segher Boessenkool <segher@...nel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Christophe Leroy <christophe.leroy@...roup.eu>,
linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
Michael Ellerman <mpe@...erman.id.au>,
Paul Mackerras <paulus@...ba.org>
Subject: Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and
WARN_ON() on PPC32
Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm:
> Hi!
>
> On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
>> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
>> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
>> >> This one possibly the branches end up in predictors, whereas conditional
>> >> trap is always just speculated not to hit. Branches may also have a
>> >> throughput limit on execution whereas trap could be more (1 per cycle
>> >> vs 4 per cycle on POWER9).
>> >
>> > I thought only *taken* branches are just one per cycle?
>>
>> Taken branches are fetched by the front end at one per cycle (assuming
>> they hit the BTAC), but all branches have to be executed by BR at one
>> per cycle
>
> This is not true. (Simple) predicted not-taken conditional branches are
> just folded out, never hit the issue queues. And they are fetched as
> many together as fit in a fetch group, can complete without limits as
> well.
No, they are all dispatched and issue to the BRU for execution. It's
trivial to construct a test of a lot of not taken branches in a row
and time a loop of it to see it executes at 1 cycle per branch.
> The BTAC is a frontend thing, used for target address prediction. It
> does not limit execution.
I didn't say it did.
> Correctly predicted simple conditional branches just get their prediction
> validated (and that is not done in the execution units). Incorrectly
> predicted branches the same, but those cause a redirect and refetch.
How could it validate prediction without issuing? It wouldn't know when
sources are ready.
>
>> > Internally *all* traps are conditional, in GCC. It also can optimise
>> > them quite well. There must be something in the kernel macros that
>> > prevents good optimisation.
>>
>> I did take a look at it at one point.
>>
>> One problem is that the kernel needs the address of the trap instruction
>> to create the entry for it. The other problem is that __builtin_trap
>> does not return so it can't be used for WARN. LLVM at least seems to
>> have a __builtin_debugtrap which does return.
>
> This is <https://gcc.gnu.org/PR99299>.
Aha.
>> The first problem seems like the show stopper though. AFAIKS it would
>> need a special builtin support that does something to create the table
>> entry, or a guarantee that we could put an inline asm right after the
>> builtin as a recognized pattern and that would give us the instruction
>> following the trap.
>
> I'm not quite sure what this means. Can't you always just put a
>
> bla: asm("");
>
> in there, and use the address of "bla"?
Not AFAIKS. Put it where?
> If not, you need to say a lot
> more about what you actually want to do :-/
We need to put the address (or relative offset) of the trap instruction
into an entry in a different section. Basically this:
__builtin_trap();
asm ("1: \n\t"
" .section __bug_table,\"aw\" \n\t"
"2: .4byte 1b - 2b - 4 \n\t"
" .previous");
Where the 1: label must follow immediately after the trap instruction,
and where the asm must be emitted even for the case of no-return traps
(but anything following the asm could be omitted).
Thanks,
Nick
Powered by blists - more mailing lists