lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <de80b4c7-1ffb-478e-9117-9d5b829470bd@gmail.com> Date: Mon, 18 Dec 2023 01:41:33 +0000 From: Ivan Orlov <ivan.orlov0322@...il.com> To: David Laight <David.Laight@...LAB.COM>, "paul.walmsley@...ive.com" <paul.walmsley@...ive.com>, "palmer@...belt.com" <palmer@...belt.com>, "aou@...s.berkeley.edu" <aou@...s.berkeley.edu> Cc: "conor.dooley@...rochip.com" <conor.dooley@...rochip.com>, "ajones@...tanamicro.com" <ajones@...tanamicro.com>, "samuel@...lland.org" <samuel@...lland.org>, "alexghiti@...osinc.com" <alexghiti@...osinc.com>, "linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "skhan@...uxfoundation.org" <skhan@...uxfoundation.org> Subject: Re: [PATCH] riscv: lib: Optimize 'strlen' function On 12/17/23 17:00, David Laight wrote: > I'd also guess that pretty much all the calls in-kernel are short. > You might try counting as: histogram[ilog2(strlen_result)]++ > and seeing what it shows for some workload. > I bet you (a beer if I see you!) that you won't see many over 1k. Hi David, Here is the statistics for strlen result: [ 223.169575] Calls count for 2^0: 6150 [ 223.173293] Calls count for 2^1: 184852 [ 223.177142] Calls count for 2^2: 313896 [ 223.180990] Calls count for 2^3: 185844 [ 223.184881] Calls count for 2^4: 87868 [ 223.188660] Calls count for 2^5: 9916 [ 223.192368] Calls count for 2^6: 1865 [ 223.196062] Calls count for 2^7: 0 [ 223.199483] Calls count for 2^8: 0 [ 223.202952] Calls count for 2^9: 0 ... Looks like I've just lost a beer :) Considering this statistics, I'd say implementing the word-oriented strlen is an overcomplication - we wouldn't get any performance gain and it just doesn't worth it. I simplified your code a little bit, it looks like the alignment there is unnecessary: QEMU test shows the same performance independently from alignment. Tests on the board gave the same result (perhaps because the CPU on the board has 2 DDR channels?) mv t0, a0 1: lbu t1, 0(a0) lbu t2, 1(a0) addi a0, a0, 2 beqz t1, 2f bnez t2, 1b addi a0, a0, 1 2: addi a0, a0, -2 sub a0, a0, t0 ret If it looks good to you, would you mind if I send the patch with it? Could I add you to suggested-by tag? -- Kind regards, Ivan Orlov
Powered by blists - more mailing lists