[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <563a5d0d-c27a-45de-9495-a82403026886@kernel.org>
Date: Sat, 3 Jan 2026 09:00:55 +0100
From: "Christophe Leroy (CS GROUP)" <chleroy@...nel.org>
To: Ryan Roberts <ryan.roberts@....com>, "Jason A. Donenfeld"
<Jason@...c4.com>
Cc: Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
Huacai Chen <chenhuacai@...nel.org>,
Madhavan Srinivasan <maddy@...ux.ibm.com>,
Michael Ellerman <mpe@...erman.id.au>, Paul Walmsley <pjw@...nel.org>,
Palmer Dabbelt <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
Kees Cook <kees@...nel.org>, "Gustavo A. R. Silva" <gustavoars@...nel.org>,
Arnd Bergmann <arnd@...db.de>, Mark Rutland <mark.rutland@....com>,
Ard Biesheuvel <ardb@...nel.org>, Jeremy Linton <jeremy.linton@....com>,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
loongarch@...ts.linux.dev, linuxppc-dev@...ts.ozlabs.org,
linux-riscv@...ts.infradead.org, linux-s390@...r.kernel.org,
linux-hardening@...r.kernel.org
Subject: Re: [PATCH v3 2/3] prandom: Convert prandom_u32_state() to
__always_inline
Le 02/01/2026 à 15:09, Ryan Roberts a écrit :
> On 02/01/2026 13:39, Jason A. Donenfeld wrote:
>> Hi Ryan,
>>
>> On Fri, Jan 2, 2026 at 2:12 PM Ryan Roberts <ryan.roberts@....com> wrote:
>>> context. Given the function is just a handful of operations and doesn't
>>
>> How many? What's this looking like in terms of assembly?
>
> 25 instructions on arm64:
31 instructions on powerpc:
00000000 <prandom_u32_state>:
0: 7c 69 1b 78 mr r9,r3
4: 80 63 00 00 lwz r3,0(r3)
8: 80 89 00 08 lwz r4,8(r9)
c: 81 69 00 04 lwz r11,4(r9)
10: 80 a9 00 0c lwz r5,12(r9)
14: 54 67 30 32 slwi r7,r3,6
18: 7c e7 1a 78 xor r7,r7,r3
1c: 55 66 10 3a slwi r6,r11,2
20: 54 88 68 24 slwi r8,r4,13
24: 54 63 90 18 rlwinm r3,r3,18,0,12
28: 7d 6b 32 78 xor r11,r11,r6
2c: 7d 08 22 78 xor r8,r8,r4
30: 54 aa 18 38 slwi r10,r5,3
34: 54 e7 9b 7e srwi r7,r7,13
38: 7c e7 1a 78 xor r7,r7,r3
3c: 51 66 2e fe rlwimi r6,r11,5,27,31
40: 54 84 38 28 rlwinm r4,r4,7,0,20
44: 7d 4a 2a 78 xor r10,r10,r5
48: 55 08 5d 7e srwi r8,r8,21
4c: 7d 08 22 78 xor r8,r8,r4
50: 7c e3 32 78 xor r3,r7,r6
54: 54 a5 68 16 rlwinm r5,r5,13,0,11
58: 55 4a a3 3e srwi r10,r10,12
5c: 7d 4a 2a 78 xor r10,r10,r5
60: 7c 63 42 78 xor r3,r3,r8
64: 90 e9 00 00 stw r7,0(r9)
68: 90 c9 00 04 stw r6,4(r9)
6c: 91 09 00 08 stw r8,8(r9)
70: 91 49 00 0c stw r10,12(r9)
74: 7c 63 52 78 xor r3,r3,r10
78: 4e 80 00 20 blr
Among those, 8 instructions are for reading/writing the state in stack.
They of course disappear when inlining.
>
>> It'd also be
>> nice to have some brief analysis of other call sites to have
>> confirmation this isn't blowing up other users.
>
> I compiled defconfig before and after this patch on arm64 and compared the text
> sizes:
>
> $ ./scripts/bloat-o-meter -t vmlinux.before vmlinux.after
> add/remove: 3/4 grow/shrink: 4/1 up/down: 836/-128 (708)
> Function old new delta
> prandom_seed_full_state 364 932 +568
> pick_next_task_fair 1940 2036 +96
> bpf_user_rnd_u32 104 196 +92
> prandom_bytes_state 204 260 +56
> e843419@...b_00012d69_e34 - 8 +8
> e843419@...7_00010ec3_23ec - 8 +8
> e843419@...b_00003767_25c - 8 +8
> bpf_prog_select_runtime 448 444 -4
> e843419@...3_0000cfd1_1580 8 - -8
> e843419@...2_0000cfba_147c 8 - -8
> e843419@...f_00008d8c_184 8 - -8
> prandom_u32_state 100 - -100
> Total: Before=19078072, After=19078780, chg +0.00%
>
> So 708 bytes more after inlining. The main cost is prandom_seed_full_state(),
> which calls prandom_u32_state() 10 times (via prandom_warmup()). I expect we
> could turn that into a loop to reduce ~450 bytes overall.
>
With following change the increase of prandom_seed_full_state() remains
reasonnable and performance wise it is a lot better as it avoids the
read/write of the state via the stack
diff --git a/lib/random32.c b/lib/random32.c
index 24e7acd9343f6..28a5b109c9018 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -94,17 +94,11 @@ EXPORT_SYMBOL(prandom_bytes_state);
static void prandom_warmup(struct rnd_state *state)
{
+ int i;
+
/* Calling RNG ten times to satisfy recurrence condition */
- prandom_u32_state(state);
- prandom_u32_state(state);
- prandom_u32_state(state);
- prandom_u32_state(state);
- prandom_u32_state(state);
- prandom_u32_state(state);
- prandom_u32_state(state);
- prandom_u32_state(state);
- prandom_u32_state(state);
- prandom_u32_state(state);
+ for (i = 0; i < 10; i++)
+ prandom_u32_state(state);
}
void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state)
The loop is:
248: 38 e0 00 0a li r7,10
24c: 7c e9 03 a6 mtctr r7
250: 55 05 30 32 slwi r5,r8,6
254: 55 46 68 24 slwi r6,r10,13
258: 55 27 18 38 slwi r7,r9,3
25c: 7c a5 42 78 xor r5,r5,r8
260: 7c c6 52 78 xor r6,r6,r10
264: 7c e7 4a 78 xor r7,r7,r9
268: 54 8b 10 3a slwi r11,r4,2
26c: 7d 60 22 78 xor r0,r11,r4
270: 54 a5 9b 7e srwi r5,r5,13
274: 55 08 90 18 rlwinm r8,r8,18,0,12
278: 54 c6 5d 7e srwi r6,r6,21
27c: 55 4a 38 28 rlwinm r10,r10,7,0,20
280: 54 e7 a3 3e srwi r7,r7,12
284: 55 29 68 16 rlwinm r9,r9,13,0,11
288: 7d 64 5b 78 mr r4,r11
28c: 7c a8 42 78 xor r8,r5,r8
290: 7c ca 52 78 xor r10,r6,r10
294: 7c e9 4a 78 xor r9,r7,r9
298: 50 04 2e fe rlwimi r4,r0,5,27,31
29c: 42 00 ff b4 bdnz 250 <prandom_seed_full_state+0x7c>
Which replaces the 10 calls to prandom_u32_state()
fc: 91 3f 00 0c stw r9,12(r31)
100: 7f e3 fb 78 mr r3,r31
104: 48 00 00 01 bl 104 <prandom_seed_full_state+0x88>
104: R_PPC_REL24 prandom_u32_state
108: 7f e3 fb 78 mr r3,r31
10c: 48 00 00 01 bl 10c <prandom_seed_full_state+0x90>
10c: R_PPC_REL24 prandom_u32_state
110: 7f e3 fb 78 mr r3,r31
114: 48 00 00 01 bl 114 <prandom_seed_full_state+0x98>
114: R_PPC_REL24 prandom_u32_state
118: 7f e3 fb 78 mr r3,r31
11c: 48 00 00 01 bl 11c <prandom_seed_full_state+0xa0>
11c: R_PPC_REL24 prandom_u32_state
120: 7f e3 fb 78 mr r3,r31
124: 48 00 00 01 bl 124 <prandom_seed_full_state+0xa8>
124: R_PPC_REL24 prandom_u32_state
128: 7f e3 fb 78 mr r3,r31
12c: 48 00 00 01 bl 12c <prandom_seed_full_state+0xb0>
12c: R_PPC_REL24 prandom_u32_state
130: 7f e3 fb 78 mr r3,r31
134: 48 00 00 01 bl 134 <prandom_seed_full_state+0xb8>
134: R_PPC_REL24 prandom_u32_state
138: 7f e3 fb 78 mr r3,r31
13c: 48 00 00 01 bl 13c <prandom_seed_full_state+0xc0>
13c: R_PPC_REL24 prandom_u32_state
140: 7f e3 fb 78 mr r3,r31
144: 48 00 00 01 bl 144 <prandom_seed_full_state+0xc8>
144: R_PPC_REL24 prandom_u32_state
148: 80 01 00 24 lwz r0,36(r1)
14c: 7f e3 fb 78 mr r3,r31
150: 83 e1 00 1c lwz r31,28(r1)
154: 7c 08 03 a6 mtlr r0
158: 38 21 00 20 addi r1,r1,32
15c: 48 00 00 00 b 15c <prandom_seed_full_state+0xe0>
15c: R_PPC_REL24 prandom_u32_state
So approx the same number of instructions in size, while better performance.
> I'm not really sure if 708 is good or bad...
That's in the noise compared to the overall size of vmlinux, but if we
change it to a loop we also reduce pressure on the cache.
Christophe
Powered by blists - more mailing lists