linux-kernel - Re: [PATCH] LoongArch: vDSO: Tune the chacha20 implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zu2Qif3n7oIMweJ2@zx2c4.com>
Date: Fri, 20 Sep 2024 17:11:05 +0200
From: "Jason A. Donenfeld" <Jason@...c4.com>
To: Xi Ruoyao <xry111@...111.site>
Cc: Huacai Chen <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>,
	Christophe Leroy <christophe.leroy@...roup.eu>,
	linux-crypto@...r.kernel.org, loongarch@...ts.linux.dev,
	linux-kernel@...r.kernel.org, Jinyang He <hejinyang@...ngson.cn>,
	Tiezhu Yang <yangtiezhu@...ngson.cn>, Arnd Bergmann <arnd@...db.de>
Subject: Re: [PATCH] LoongArch: vDSO: Tune the chacha20 implementation

On Thu, Sep 19, 2024 at 05:13:59PM +0800, Xi Ruoyao wrote:
> As Christophe pointed out, tuning the chacha20 implementation by
> scheduling the instructions like what GCC does can improve the
> performance.
> 
> The tuning does not introduce too much complexity (basically it's just
> reordering some instructions).  And the tuning does not hurt readibility
> too much: actually the tuned code looks even more similar to a
> textbook-style implementation based on 128-bit vectors.  So overall it's
> a good deal to me.
> 
> Tested with vdso_test_getchacha and benched with vdso_test_getrandom.
> On a LA664 the speedup is 5%, and I expect a larger speedup on LA[2-4]64
> with a lower issue rate.
> 
> Suggested-by: Christophe Leroy <christophe.leroy@...roup.eu>
> Link: https://lore.kernel.org/all/77655d9e-fc05-4300-8f0d-7b2ad840d091@csgroup.eu/
> Signed-off-by: Xi Ruoyao <xry111@...111.site>

That seems like a reasonable optimization to me. I'll queue it up in
random.git and send it in my pull next week.

Thanks.

Jason