lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 17 Dec 2023 18:10:54 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Ivan Orlov' <ivan.orlov0322@...il.com>, "paul.walmsley@...ive.com"
	<paul.walmsley@...ive.com>, "palmer@...belt.com" <palmer@...belt.com>,
	"aou@...s.berkeley.edu" <aou@...s.berkeley.edu>
CC: "conor.dooley@...rochip.com" <conor.dooley@...rochip.com>,
	"ajones@...tanamicro.com" <ajones@...tanamicro.com>, "samuel@...lland.org"
	<samuel@...lland.org>, "alexghiti@...osinc.com" <alexghiti@...osinc.com>,
	"linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"skhan@...uxfoundation.org" <skhan@...uxfoundation.org>
Subject: RE: [PATCH] riscv: lib: Optimize 'strlen' function

From: Ivan Orlov
> Sent: 13 December 2023 15:46

Looking at the old code...

>  1:
> -	lbu	t0, 0(t1)
> -	beqz	t0, 2f
> -	addi	t1, t1, 1
> -	j	1b

I suspect there is (at least) a two clock stall between
the 'ldu' and 'beqz'.
Allowing for one clock for the 'predicted taken' branch
that is 7 clocks/byte.

Try this one - especially on 32bit:

	mov	t0, a0
	and	t1, t0, 1
	sub	t0, t0, t1
	bnez	t1, 2f
1:
	ldb	t1, 0(t0)
2:	ldb	t2, 1(t0)
	add	t0, t0, 2
	beqz	t1, 3f
	bnez	t2, 1b
	add	t0, t0, 1
3:	sub	t0, t0, 2
	sub	a0, t0, a0
	ret

Might be 6 clocks for 2 bytes.
The much smaller cache footprint will also help.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ