lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <563a5d0d-c27a-45de-9495-a82403026886@kernel.org>
Date: Sat, 3 Jan 2026 09:00:55 +0100
From: "Christophe Leroy (CS GROUP)" <chleroy@...nel.org>
To: Ryan Roberts <ryan.roberts@....com>, "Jason A. Donenfeld"
 <Jason@...c4.com>
Cc: Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
 Huacai Chen <chenhuacai@...nel.org>,
 Madhavan Srinivasan <maddy@...ux.ibm.com>,
 Michael Ellerman <mpe@...erman.id.au>, Paul Walmsley <pjw@...nel.org>,
 Palmer Dabbelt <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
 Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
 Alexander Gordeev <agordeev@...ux.ibm.com>,
 Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
 Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
 Kees Cook <kees@...nel.org>, "Gustavo A. R. Silva" <gustavoars@...nel.org>,
 Arnd Bergmann <arnd@...db.de>, Mark Rutland <mark.rutland@....com>,
 Ard Biesheuvel <ardb@...nel.org>, Jeremy Linton <jeremy.linton@....com>,
 linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
 loongarch@...ts.linux.dev, linuxppc-dev@...ts.ozlabs.org,
 linux-riscv@...ts.infradead.org, linux-s390@...r.kernel.org,
 linux-hardening@...r.kernel.org
Subject: Re: [PATCH v3 2/3] prandom: Convert prandom_u32_state() to
 __always_inline



Le 02/01/2026 à 15:09, Ryan Roberts a écrit :
> On 02/01/2026 13:39, Jason A. Donenfeld wrote:
>> Hi Ryan,
>>
>> On Fri, Jan 2, 2026 at 2:12 PM Ryan Roberts <ryan.roberts@....com> wrote:
>>> context. Given the function is just a handful of operations and doesn't
>>
>> How many? What's this looking like in terms of assembly?
> 
> 25 instructions on arm64:

31 instructions on powerpc:

00000000 <prandom_u32_state>:
    0:	7c 69 1b 78 	mr      r9,r3
    4:	80 63 00 00 	lwz     r3,0(r3)
    8:	80 89 00 08 	lwz     r4,8(r9)
    c:	81 69 00 04 	lwz     r11,4(r9)
   10:	80 a9 00 0c 	lwz     r5,12(r9)
   14:	54 67 30 32 	slwi    r7,r3,6
   18:	7c e7 1a 78 	xor     r7,r7,r3
   1c:	55 66 10 3a 	slwi    r6,r11,2
   20:	54 88 68 24 	slwi    r8,r4,13
   24:	54 63 90 18 	rlwinm  r3,r3,18,0,12
   28:	7d 6b 32 78 	xor     r11,r11,r6
   2c:	7d 08 22 78 	xor     r8,r8,r4
   30:	54 aa 18 38 	slwi    r10,r5,3
   34:	54 e7 9b 7e 	srwi    r7,r7,13
   38:	7c e7 1a 78 	xor     r7,r7,r3
   3c:	51 66 2e fe 	rlwimi  r6,r11,5,27,31
   40:	54 84 38 28 	rlwinm  r4,r4,7,0,20
   44:	7d 4a 2a 78 	xor     r10,r10,r5
   48:	55 08 5d 7e 	srwi    r8,r8,21
   4c:	7d 08 22 78 	xor     r8,r8,r4
   50:	7c e3 32 78 	xor     r3,r7,r6
   54:	54 a5 68 16 	rlwinm  r5,r5,13,0,11
   58:	55 4a a3 3e 	srwi    r10,r10,12
   5c:	7d 4a 2a 78 	xor     r10,r10,r5
   60:	7c 63 42 78 	xor     r3,r3,r8
   64:	90 e9 00 00 	stw     r7,0(r9)
   68:	90 c9 00 04 	stw     r6,4(r9)
   6c:	91 09 00 08 	stw     r8,8(r9)
   70:	91 49 00 0c 	stw     r10,12(r9)
   74:	7c 63 52 78 	xor     r3,r3,r10
   78:	4e 80 00 20 	blr

Among those, 8 instructions are for reading/writing the state in stack. 
They of course disappear when inlining.

> 
>> It'd also be
>> nice to have some brief analysis of other call sites to have
>> confirmation this isn't blowing up other users.
> 
> I compiled defconfig before and after this patch on arm64 and compared the text
> sizes:
> 
> $ ./scripts/bloat-o-meter -t vmlinux.before vmlinux.after
> add/remove: 3/4 grow/shrink: 4/1 up/down: 836/-128 (708)
> Function                                     old     new   delta
> prandom_seed_full_state                      364     932    +568
> pick_next_task_fair                         1940    2036     +96
> bpf_user_rnd_u32                             104     196     +92
> prandom_bytes_state                          204     260     +56
> e843419@...b_00012d69_e34                      -       8      +8
> e843419@...7_00010ec3_23ec                     -       8      +8
> e843419@...b_00003767_25c                      -       8      +8
> bpf_prog_select_runtime                      448     444      -4
> e843419@...3_0000cfd1_1580                     8       -      -8
> e843419@...2_0000cfba_147c                     8       -      -8
> e843419@...f_00008d8c_184                      8       -      -8
> prandom_u32_state                            100       -    -100
> Total: Before=19078072, After=19078780, chg +0.00%
> 
> So 708 bytes more after inlining. The main cost is prandom_seed_full_state(),
> which calls prandom_u32_state() 10 times (via prandom_warmup()). I expect we
> could turn that into a loop to reduce ~450 bytes overall.
> 
With following change the increase of prandom_seed_full_state() remains 
reasonnable and performance wise it is a lot better as it avoids the 
read/write of the state via the stack

diff --git a/lib/random32.c b/lib/random32.c
index 24e7acd9343f6..28a5b109c9018 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -94,17 +94,11 @@ EXPORT_SYMBOL(prandom_bytes_state);

  static void prandom_warmup(struct rnd_state *state)
  {
+	int i;
+
  	/* Calling RNG ten times to satisfy recurrence condition */
-	prandom_u32_state(state);
-	prandom_u32_state(state);
-	prandom_u32_state(state);
-	prandom_u32_state(state);
-	prandom_u32_state(state);
-	prandom_u32_state(state);
-	prandom_u32_state(state);
-	prandom_u32_state(state);
-	prandom_u32_state(state);
-	prandom_u32_state(state);
+	for (i = 0; i < 10; i++)
+		prandom_u32_state(state);
  }

  void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state)

The loop is:

  248:	38 e0 00 0a 	li      r7,10
  24c:	7c e9 03 a6 	mtctr   r7
  250:	55 05 30 32 	slwi    r5,r8,6
  254:	55 46 68 24 	slwi    r6,r10,13
  258:	55 27 18 38 	slwi    r7,r9,3
  25c:	7c a5 42 78 	xor     r5,r5,r8
  260:	7c c6 52 78 	xor     r6,r6,r10
  264:	7c e7 4a 78 	xor     r7,r7,r9
  268:	54 8b 10 3a 	slwi    r11,r4,2
  26c:	7d 60 22 78 	xor     r0,r11,r4
  270:	54 a5 9b 7e 	srwi    r5,r5,13
  274:	55 08 90 18 	rlwinm  r8,r8,18,0,12
  278:	54 c6 5d 7e 	srwi    r6,r6,21
  27c:	55 4a 38 28 	rlwinm  r10,r10,7,0,20
  280:	54 e7 a3 3e 	srwi    r7,r7,12
  284:	55 29 68 16 	rlwinm  r9,r9,13,0,11
  288:	7d 64 5b 78 	mr      r4,r11
  28c:	7c a8 42 78 	xor     r8,r5,r8
  290:	7c ca 52 78 	xor     r10,r6,r10
  294:	7c e9 4a 78 	xor     r9,r7,r9
  298:	50 04 2e fe 	rlwimi  r4,r0,5,27,31
  29c:	42 00 ff b4 	bdnz    250 <prandom_seed_full_state+0x7c>

Which replaces the 10 calls to prandom_u32_state()

   fc:	91 3f 00 0c 	stw     r9,12(r31)
  100:	7f e3 fb 78 	mr      r3,r31
  104:	48 00 00 01 	bl      104 <prandom_seed_full_state+0x88>
			104: R_PPC_REL24	prandom_u32_state
  108:	7f e3 fb 78 	mr      r3,r31
  10c:	48 00 00 01 	bl      10c <prandom_seed_full_state+0x90>
			10c: R_PPC_REL24	prandom_u32_state
  110:	7f e3 fb 78 	mr      r3,r31
  114:	48 00 00 01 	bl      114 <prandom_seed_full_state+0x98>
			114: R_PPC_REL24	prandom_u32_state
  118:	7f e3 fb 78 	mr      r3,r31
  11c:	48 00 00 01 	bl      11c <prandom_seed_full_state+0xa0>
			11c: R_PPC_REL24	prandom_u32_state
  120:	7f e3 fb 78 	mr      r3,r31
  124:	48 00 00 01 	bl      124 <prandom_seed_full_state+0xa8>
			124: R_PPC_REL24	prandom_u32_state
  128:	7f e3 fb 78 	mr      r3,r31
  12c:	48 00 00 01 	bl      12c <prandom_seed_full_state+0xb0>
			12c: R_PPC_REL24	prandom_u32_state
  130:	7f e3 fb 78 	mr      r3,r31
  134:	48 00 00 01 	bl      134 <prandom_seed_full_state+0xb8>
			134: R_PPC_REL24	prandom_u32_state
  138:	7f e3 fb 78 	mr      r3,r31
  13c:	48 00 00 01 	bl      13c <prandom_seed_full_state+0xc0>
			13c: R_PPC_REL24	prandom_u32_state
  140:	7f e3 fb 78 	mr      r3,r31
  144:	48 00 00 01 	bl      144 <prandom_seed_full_state+0xc8>
			144: R_PPC_REL24	prandom_u32_state
  148:	80 01 00 24 	lwz     r0,36(r1)
  14c:	7f e3 fb 78 	mr      r3,r31
  150:	83 e1 00 1c 	lwz     r31,28(r1)
  154:	7c 08 03 a6 	mtlr    r0
  158:	38 21 00 20 	addi    r1,r1,32
  15c:	48 00 00 00 	b       15c <prandom_seed_full_state+0xe0>
			15c: R_PPC_REL24	prandom_u32_state


So approx the same number of instructions in size, while better performance.

> I'm not really sure if 708 is good or bad...

That's in the noise compared to the overall size of vmlinux, but if we 
change it to a loop we also reduce pressure on the cache.

Christophe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ