linux-kernel - Re: [PATCH v3] ARM: add get_user() support for 8 byte types

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <53A04888.5010204@linaro.org>
Date:	Tue, 17 Jun 2014 14:54:16 +0100
From:	Daniel Thompson <daniel.thompson@...aro.org>
To:	Russell King - ARM Linux <linux@....linux.org.uk>
CC:	Rob Clark <robdclark@...il.com>,
	Nicolas Pitre <nicolas.pitre@...aro.org>,
	Arnd Bergmann <arnd.bergmann@...aro.org>,
	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
	patches@...aro.org, linaro-kernel@...ts.linaro.org
Subject: Re: [PATCH v3] ARM: add get_user() support for 8 byte types

On 17/06/14 14:36, Russell King - ARM Linux wrote:
> On Tue, Jun 17, 2014 at 02:28:44PM +0100, Daniel Thompson wrote:
>> On 17/06/14 12:09, Russell King - ARM Linux wrote:
>>> On Tue, Jun 17, 2014 at 11:17:23AM +0100, Daniel Thompson wrote:
>>>> ... at this point there is a narrowing cast followed by an implicit
>>>> widening. This results in compiler either ignoring r3 altogether or, if
>>>> spilling to the stack, generating code to set r3 to zero before doing
>>>> the store.
>>>
>>> In actual fact, there's very little difference between the two
>>> implementations in terms of generated code.
>>>
>>> The difference between them is what happens on the 64-bit big endian
>>> narrowing case, where we use __get_user_4 with your version.  This
>>> adds one additional instruction.
>>
>> Good point.
>>
>>
>>> and 64-bit narrowed to 32-bit:
>>>
>>>         str     lr, [sp, #-4]!
>>> -       mov     ip, r0
>>> +       mov     r3, r0
>>>         mov     r0, r1
>>>  #APP
>>>  @ 275 "t-getuser.c" 1
>>> -       bl      __get_user_8
>>> +       bl      __get_user_4
>>>  @ 0 "" 2
>>> -       str     r2, [ip, #0]
>>> +       str     r2, [r3, #0]
>>>         ldr     pc, [sp], #4
>>
>> The later case avoids allocating r3 for the __get_user_x and should
>> reduce register pressure and, potentially, saves a few instructions
>> elsewhere (one of my rather large test functions does demonstrate this
>> effect).
>>
>> I don't know if we care about that. If we do I'm certainly happy to put
>> a patch together than exploits this (whilst avoiding the add in the big
>> endian case).
> 
> No need - the + case is your version, the - case is my version.  So your
> version wins on this point. :)

:) Thanks, although credit really goes to Rob Clark...

I think currently:

1. Rob's patch is better for register pressure in the narrowing case
   (above).

2. Your patch is probably better for big endian due to the add in Rob's
   version. I say probably because, without proof, I suspect the cost
   of the add would in most cases outweigh the register pressure
   benefit.

3. Your patch has better implementation of __get_user_8 (it uses ldrd).

Hence I'm suspect we need to combine elements from both patches.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/