linux-kernel - Re: [RFC PATCH v1 44/50] arm64: ptr auth: Use get_random_u64 instead of

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200330193237.GC9199@SDF.ORG>
Date:   Mon, 30 Mar 2020 19:32:37 +0000
From:   George Spelvin <lkml@....ORG>
To:     Mark Rutland <mark.rutland@....com>
Cc:     linux-kernel@...r.kernel.org,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>,
        linux-arm-kernel@...ts.infradead.org, lkml@....org
Subject: Re: [RFC PATCH v1 44/50] arm64: ptr auth: Use get_random_u64 instead
 of _bytes

Sorry for the delay responding; I had to re-set-up my arm64
cross-compilation environment.

On Mon, Mar 30, 2020 at 11:57:45AM +0100, Mark Rutland wrote:
> On Tue, Dec 10, 2019 at 07:15:55AM -0500, George Spelvin wrote:
>> Since these are authentication keys, stored in the kernel as long
>> as they're important, get_random_u64 is fine.  In particular,
>> get_random_bytes has significant per-call overhead, so five
>> separate calls is painful.
> 
> As I am unaware, how does the cost of get_random_bytes() compare to the
> cost of get_random_u64()?

It's approximately 8 times the cost.

Because get_random_bytes() implements anti-backtracking, it's a minimum 
of one global lock and one ChaCha20 operation per call.  Even though 
chacha_block_generic() returns 64 bytes, for anti-backtracking we use 
32 of them to generate a new key and discard the remainder.

get_random_u64() uses the exact same generator, but amortizes the cost by 
storing the output in a per-CPU buffer which it only has to refill every 
64 bytes generated.  7/8 of the time, it's just a fetch from a per-CPU 
data structure.

>> This ended up being a more extensive change, since the previous
>> code was unrolled and 10 calls to get_random_u64() seems excessive.
>> So the code was rearranged to have smaller object size.
> 
> It's not really "unrolled", but rather "not a loop", so I'd prefer to
> not artifially make it look like one.

I intended that to mean "not in a loop, but could be".  I guess
this entire exchange is about the distinction between "could be"
and "should be".  ;-)

Yes, I went overboard, and your proposed change below is perfectly
fine with me.

> Could you please quantify the size difference when going from
> get_random_bytes() to get_random_u64(), if that is excessive enough to
> warrant changing the structure of the code? Otherwise please leave the
> structure as-is as given it is much easier to reason about -- suggestion
> below on how to do that neatly.

Here are the various code sizes:
   text    data     bss     dec     hex filename
   1480       0       0    1480     5c8 arch/arm64/kernel/pointer_auth.o.old
    862       0       0     862     35e arch/arm64/kernel/pointer_auth.o.new
   1492       0       0    1492     5d4 arch/arm64/kernel/pointer_auth.o.new2
   1560       0       0    1560     618 arch/arm64/kernel/pointer_auth.o.new3

"old" is the existing code.  "new" is my restructured code.
"new2" is your simple change with a __ptrauth_key_init() helper.
"new3" is with the helper forced noinline.

I shrank the code significantly, but deciding whether that's a net
improvement is your perogative.

I should mention that at the end of my patch series, I added a function 
(currently called get_random_nonce(), but that's subject to revision) 
which uses the get_random_u64 internals with the same interface as 
get_random_bytes().  We could postpone this whole thing until that gets
a final name and merged.

(BTW, somehow in my patch a "#include <linux/prctl.h>" needed in the 
revised <asm/pointer_auth.h> got omitted.  I probably did something stupid 
like added it in my cross-compilation tree but didn't push it back to my 
main development tree.  Sorry.)