lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <05B06AD6-0AB2-4E37-A71D-0D917070FF0F@zytor.com>
Date: Sun, 21 Dec 2025 13:39:48 -0800
From: "H. Peter Anvin" <hpa@...or.com>
To: Eric Dumazet <edumazet@...gle.com>, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Uros Bizjak <ubizjak@...il.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>, x86@...nel.org
CC: linux-kernel <linux-kernel@...r.kernel.org>,
        Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [PATCH 2/2] x86/irqflags: Use ASM_OUTPUT_RM in native_save_fl()

On December 19, 2025 3:20:07 AM PST, Eric Dumazet <edumazet@...gle.com> wrote:
>clang is generating very inefficient code for native_save_fl() which
>is used for local_irq_save() in critical spots.
>
>Allowing the "pop %0" to use memory:
>
>1) forces the compiler to add annoying stack canaries when
>   CONFIG_STACKPROTECTOR_STRONG=y in many places.
>
>2) Almost always is followed by an immediate "move memory,register"
>
>One good example is _raw_spin_lock_irqsave, with 8 extra instructions
>
>ffffffff82067a30 <_raw_spin_lock_irqsave>:
>ffffffff82067a30:		...
>ffffffff82067a39:		53						push   %rbx
>
>// Three instructions to ajust the stack, read the per-cpu canary
>// and copy it to 8(%rsp)
>ffffffff82067a3a:		48 83 ec 10 			sub    $0x10,%rsp
>ffffffff82067a3e:		65 48 8b 05 da 15 45 02 mov    %gs:0x24515da(%rip),%rax 	   # <__stack_chk_guard>
>ffffffff82067a46:		48 89 44 24 08			mov    %rax,0x8(%rsp)
>
>ffffffff82067a4b:		9c						pushf
>
>// instead of pop %rbx, compiler uses 2 instructions.
>ffffffff82067a4c:		8f 04 24				pop    (%rsp)
>ffffffff82067a4f:		48 8b 1c 24 			mov    (%rsp),%rbx
>
>ffffffff82067a53:		fa						cli
>ffffffff82067a54:		b9 01 00 00 00			mov    $0x1,%ecx
>ffffffff82067a59:		31 c0					xor    %eax,%eax
>ffffffff82067a5b:		f0 0f b1 0f 			lock cmpxchg %ecx,(%rdi)
>ffffffff82067a5f:		75 1d					jne    ffffffff82067a7e <_raw_spin_lock_irqsave+0x4e>
>
>// three instructions to check the stack canary
>ffffffff82067a61:		65 48 8b 05 b7 15 45 02 mov    %gs:0x24515b7(%rip),%rax 	   # <__stack_chk_guard>
>ffffffff82067a69:		48 3b 44 24 08			cmp    0x8(%rsp),%rax
>ffffffff82067a6e:		75 17					jne    ffffffff82067a87
>
>...
>
>// One extra instruction to adjust the stack.
>ffffffff82067a73:		48 83 c4 10 			add    $0x10,%rsp
>...
>
>// One more instruction in case the stack was mangled.
>ffffffff82067a87:		e8 a4 35 ff ff			call   ffffffff8205b030 <__stack_chk_fail>
>
>This patch changes nothing for gcc, but for clang saves ~20000 bytes of text
>even though more functions are inlined.
>
>$ size vmlinux.gcc.before vmlinux.gcc.after vmlinux.clang.before vmlinux.clang.after
>   text	   data		bss		dec		hex	filename
>45565821	25005462	4704800	75276083	47c9f33	vmlinux.gcc.before
>45565821	25005462	4704800	75276083	47c9f33	vmlinux.gcc.after
>45121072	24638617	5533040	75292729	47ce039	vmlinux.clang.before
>45093887	24638633	5536808	75269328	47c84d0	vmlinux.clang.after
>
>$ scripts/bloat-o-meter -t vmlinux.clang.before vmlinux.clang.after
>add/remove: 1/2 grow/shrink: 21/533 up/down: 2250/-22112 (-19862)
>Function									 old	 new   delta
>wakeup_cpu_via_vmgexit						1002	1447	+445
>rcu_tasks_trace_pregp_step					1052	1454	+402
>snp_kexec_finish							1290	1527	+237
>check_all_holdout_tasks_trace				 909	1106	+197
>x2apic_send_IPI_mask_allbutself 			  38	 198	+160
>hpet_set_rtc_irq_bit						 118	 265	+147
>x2apic_send_IPI_mask						  38	 184	+146
>ring_buffer_poll_wait						 261	 405	+144
>rb_watermark_hit							 253	 386	+133
>__unlikely_text_end 						 368	 416	 +48
>printk_trigger_flush						 262	 298	 +36
>__softirqentry_text_end 					   -	  32	 +32
>pstore_dump 								1145	1164	 +19
>printk_legacy_allow_panic_sync				 159	 178	 +19
>netlink_insert								 979	 995	 +16
>console_try_replay_all						 268	 283	 +15
>do_flush_tlb_all							 151	 165	 +14
>__flush_tlb_all 							 151	 165	 +14
>synchronize_rcu_expedited					2248	2259	 +11
>...
>tcp_wfree									 402	 332	 -70
>stacktrace_trigger							 133	  62	 -71
>w1_touch_bit								 418	 343	 -75
>w1_triplet									 446	 370	 -76
>link_create 								 980	 902	 -78
>drain_dead_softirq_workfn					 425	 347	 -78
>kcryptd_queue_crypt 						 253	 174	 -79
>perf_event_aux_pause						 448	 368	 -80
>idle_worker_timeout 						 320	 240	 -80
>srcu_funnel_exp_start						 418	 333	 -85
>call_rcu									 751	 666	 -85
>enable_IR_x2apic							 279	 191	 -88
>bpf_link_free								 432	 342	 -90
>synchronize_rcu 							 497	 403	 -94
>identify_cpu								2665	2569	 -96
>ftrace_modify_all_code						 355	 258	 -97
>load_gs_index								 212	 104	-108
>verity_end_io								 369	 257	-112
>bpf_prog_detach 							 672	 555	-117
>__x2apic_send_IPI_mask						 552	 275	-277
>snp_cleanup_vmsa							 284	   -	-284
>__noinstr_text_start						3072	1920   -1152
>Total: Before=28577936, After=28558074, chg -0.07%
>
>Signed-off-by: Eric Dumazet <edumazet@...gle.com>
>Cc: Uros Bizjak <ubizjak@...il.com>
>Cc: Linus Torvalds <torvalds@...ux-foundation.org>
>---
>v2: use ASM_OUTPUT_RM (Uros Bizjak)
>v1: https://lore.kernel.org/lkml/CANn89iJ+HKXRn7qF4KrT6gghw6CwWcsvoj8Scw17CkCqhGbk=A@mail.gmail.com/T/#mc2322d458f07118580eca7c5fa1f0bc931c32d30
>
> arch/x86/include/asm/irqflags.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
>index b30e5474c18e1be63b7c69354c26ae6a6cb02731..a1193e9d65f2000d6de88468bee58f2dae9c6cd5 100644
>--- a/arch/x86/include/asm/irqflags.h
>+++ b/arch/x86/include/asm/irqflags.h
>@@ -25,7 +25,7 @@ extern __always_inline unsigned long native_save_fl(void)
>	 */
>	asm volatile("# __raw_save_flags\n\t"
>			 "pushf ; pop %0"
>-			 : "=rm" (flags)
>+			 : ASM_OUTPUT_RM (flags)
>			 : /* no input */
>			 : "memory");
>
>--
>2.52.0.322.g1dd061c0dc-goog
>

Maybe they should provide an intrinsic for this operation?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ