[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3b7f3a1e-0355-b6d4-14cd-300bf4d3629a@csgroup.eu>
Date: Thu, 11 Feb 2021 13:26:12 +0100
From: Christophe Leroy <christophe.leroy@...roup.eu>
To: Segher Boessenkool <segher@...nel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mackerras <paulus@...ba.org>,
Michael Ellerman <mpe@...erman.id.au>, npiggin@...il.com,
linuxppc-dev@...ts.ozlabs.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] powerpc/bug: Remove specific powerpc BUG_ON()
Le 11/02/2021 à 12:49, Segher Boessenkool a écrit :
> On Thu, Feb 11, 2021 at 07:41:52AM +0000, Christophe Leroy wrote:
>> powerpc BUG_ON() is based on using twnei or tdnei instruction,
>> which obliges gcc to format the condition into a 0 or 1 value
>> in a register.
>
> Huh? Why is that?
>
> Will it work better if this used __builtin_trap? Or does the kernel only
> detect very specific forms of trap instructions?
>
>> By using a generic implementation, gcc will generate a branch
>> to the unconditional trap generated by BUG().
>
> That is many more instructions than ideal.
>
>> As modern powerpc implement branch folding, that's even more efficient.
>
> What PowerPC cpus implement branch folding? I know none.
Extract from powerpc mpc8323 reference manual:
High instruction and data throughput
— Zero-cycle branch capability (branch folding)
— Programmable static branch prediction on unresolved conditional branches
— Two integer units with enhanced multipliers in thee300c2 for increased integer instruction
throughput and a maximum two-cycle latency for multiply instructions
— Instruction fetch unit capable of fetching two instructions per clock from the instruction cache
— A six-entry instruction queue (IQ) that provides lookahead capability
— Independent pipelines with feed-forwarding that reduces data dependencies in hardware
— 16-Kbyte, four-way set-associative instruction and data caches on the e300c2.
— Cache write-back or write-through operation programmable on a per-page or per-block basis
— Features for instruction and data cache locking and protection
— BPU that performs CR lookahead operations
— Address translation facilities for 4-Kbyte page size, variable block size, and 256-Mbyte
segment size
— A 64-entry, two-way, set-associative ITLB and DTLB
— Eight-entry data and instruction BAT arrays providing 128-Kbyte to 256-Mbyte blocks
— Software table search operations and updates supported through fast trap mechanism
— 52-bit virtual address; 32-bit physical address
Christophe
Powered by blists - more mailing lists