[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zu2Qif3n7oIMweJ2@zx2c4.com>
Date: Fri, 20 Sep 2024 17:11:05 +0200
From: "Jason A. Donenfeld" <Jason@...c4.com>
To: Xi Ruoyao <xry111@...111.site>
Cc: Huacai Chen <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>,
Christophe Leroy <christophe.leroy@...roup.eu>,
linux-crypto@...r.kernel.org, loongarch@...ts.linux.dev,
linux-kernel@...r.kernel.org, Jinyang He <hejinyang@...ngson.cn>,
Tiezhu Yang <yangtiezhu@...ngson.cn>, Arnd Bergmann <arnd@...db.de>
Subject: Re: [PATCH] LoongArch: vDSO: Tune the chacha20 implementation
On Thu, Sep 19, 2024 at 05:13:59PM +0800, Xi Ruoyao wrote:
> As Christophe pointed out, tuning the chacha20 implementation by
> scheduling the instructions like what GCC does can improve the
> performance.
>
> The tuning does not introduce too much complexity (basically it's just
> reordering some instructions). And the tuning does not hurt readibility
> too much: actually the tuned code looks even more similar to a
> textbook-style implementation based on 128-bit vectors. So overall it's
> a good deal to me.
>
> Tested with vdso_test_getchacha and benched with vdso_test_getrandom.
> On a LA664 the speedup is 5%, and I expect a larger speedup on LA[2-4]64
> with a lower issue rate.
>
> Suggested-by: Christophe Leroy <christophe.leroy@...roup.eu>
> Link: https://lore.kernel.org/all/77655d9e-fc05-4300-8f0d-7b2ad840d091@csgroup.eu/
> Signed-off-by: Xi Ruoyao <xry111@...111.site>
That seems like a reasonable optimization to me. I'll queue it up in
random.git and send it in my pull next week.
Thanks.
Jason
Powered by blists - more mailing lists