[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200120172732.GC3191@gate.crashing.org>
Date: Mon, 20 Jan 2020 11:27:32 -0600
From: Segher Boessenkool <segher@...nel.crashing.org>
To: Christophe Leroy <christophe.leroy@....fr>
Cc: Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mackerras <paulus@...ba.org>,
Michael Ellerman <mpe@...erman.id.au>, nathanl@...ux.ibm.com,
arnd@...db.de, tglx@...utronix.de, vincenzo.frascino@....com,
luto@...nel.org, x86@...nel.org, linuxppc-dev@...ts.ozlabs.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-mips@...r.kernel.org
Subject: Re: [RFC PATCH v4 00/11] powerpc: switch VDSO to C implementation.
On Mon, Jan 20, 2020 at 06:08:23PM +0100, Christophe Leroy wrote:
> Not easy I think.
>
> First we have the unavoidable ASM entry function that can't be dropped
> because of the CR[SO] bit the set on error or clear on no error and that
> can't be done in C.
Yup.
> In our ASM VDSO, fixed shifts are used, while in generic C VDSO, shifts
> are generic and read from the VDSO data.
Does that cost more than just a few cycles?
> And there is still some funny code generated by GCC (8.1), like:
>
> 620: 7d 29 3c 30 srw r9,r9,r7
> 624: 21 87 00 20 subfic r12,r7,32
> 628: 7d 07 3c 31 srw. r7,r8,r7
> 62c: 7d 08 60 30 slw r8,r8,r12
> 630: 7d 0b 4b 78 or r11,r8,r9
(This can be done cheaper for fixed shifts, you can use rlwimi then).
> 634: 39 40 00 00 li r10,0
> 638: 40 82 00 84 bne 6bc <__c_kernel_clock_gettime+0x114>
> 63c: 81 23 00 24 lwz r9,36(r3)
> 640: 81 05 00 00 lwz r8,0(r5)
> ...
> 6bc: 7d 69 5b 78 mr r9,r11
> 6c0: 7c ea 3b 78 mr r10,r7
> 6c4: 7d 2b 4b 78 mr r11,r9
> 6c8: 4b ff ff 74 b 63c <__c_kernel_clock_gettime+0x94>
>
> This branch to 6bc is totally useless:
> - copying r11 into r9 is pointless as r9 is overwritten in 63c
> - copying back r9 into r11 is pointless as r11 has not been modified
> inbetween.
Yeah, huh, how did that happen.
> - loading r10 with 0 then overwritting r10 with r7 when r7 is not 0 is
> pointless as well, could have directly put the result of srw. in r10.
This may be harder to make the compiler do.
But the r9/r11 thing suggests you are preventing optimisation somewhere,
maybe with some asm? Do you have some small testcase I can compile?
Segher
Powered by blists - more mailing lists