lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 7 Jun 2024 17:27:45 +0200
From: "Jason A. Donenfeld" <Jason@...c4.com>
To: Eric Biggers <ebiggers@...nel.org>
Cc: linux-kernel@...r.kernel.org, patches@...ts.linux.dev,
	tglx@...utronix.de, linux-crypto@...r.kernel.org,
	linux-api@...r.kernel.org, x86@...nel.org,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Adhemerval Zanella Netto <adhemerval.zanella@...aro.org>,
	Carlos O'Donell <carlos@...hat.com>,
	Florian Weimer <fweimer@...hat.com>, Arnd Bergmann <arnd@...db.de>,
	Jann Horn <jannh@...gle.com>,
	Christian Brauner <brauner@...nel.org>,
	David Hildenbrand <dhildenb@...hat.com>,
	Samuel Neves <sneves@....uc.pt>
Subject: Re: [PATCH v16 5/5] x86: vdso: Wire up getrandom() vDSO
 implementation

On Thu, May 30, 2024 at 08:38:16PM -0700, Eric Biggers wrote:
> On Tue, May 28, 2024 at 02:19:54PM +0200, Jason A. Donenfeld wrote:
> > diff --git a/arch/x86/entry/vdso/vgetrandom-chacha.S b/arch/x86/entry/vdso/vgetrandom-chacha.S
> > new file mode 100644
> > index 000000000000..d79e2bd97598
> > --- /dev/null
> > +++ b/arch/x86/entry/vdso/vgetrandom-chacha.S
> > @@ -0,0 +1,178 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2022 Jason A. Donenfeld <Jason@...c4.com>. All Rights Reserved.
> > + */
> > +
> > +#include <linux/linkage.h>
> > +#include <asm/frame.h>
> > +
> > +.section	.rodata, "a"
> > +.align 16
> > +CONSTANTS:	.octa 0x6b20657479622d323320646e61707865
> > +.text
> > +
> > +/*
> > + * Very basic SSE2 implementation of ChaCha20. Produces a given positive number
> > + * of blocks of output with a nonce of 0, taking an input key and 8-byte
> > + * counter. Importantly does not spill to the stack. Its arguments are:
> > + *
> > + *	rdi: output bytes
> > + *	rsi: 32-byte key input
> > + *	rdx: 8-byte counter input/output
> > + *	rcx: number of 64-byte blocks to write to output
> > + */
> > +SYM_FUNC_START(__arch_chacha20_blocks_nostack)
> > +
> > +.set	output,		%rdi
> > +.set	key,		%rsi
> > +.set	counter,	%rdx
> > +.set	nblocks,	%rcx
> > +.set	i,		%al
> > +/* xmm registers are *not* callee-save. */
> > +.set	state0,		%xmm0
> > +.set	state1,		%xmm1
> > +.set	state2,		%xmm2
> > +.set	state3,		%xmm3
> > +.set	copy0,		%xmm4
> > +.set	copy1,		%xmm5
> > +.set	copy2,		%xmm6
> > +.set	copy3,		%xmm7
> > +.set	temp,		%xmm8
> > +.set	one,		%xmm9
> 
> An "interesting" x86_64 quirk: in SSE instructions, registers xmm0-xmm7 take
> fewer bytes to encode than xmm8-xmm15.
> 
> Since 'temp' is used frequently, moving it into the lower range (and moving one
> of the 'copy' registers, which isn't used as frequently, into the higher range)
> decreases the code size of __arch_chacha20_blocks_nostack() by 5%.

That's a nice trick. Thank you very much for it.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ