lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250506092445.GBaBnVXXyvnazly6iF@fat_crate.local>
Date: Tue, 6 May 2025 11:24:45 +0200
From: Borislav Petkov <bp@...en8.de>
To: "Kirill A. Shutemov" <kirill@...temov.name>,
	Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-kernel@...r.kernel.org, linux-tip-commits@...r.kernel.org,
	Ingo Molnar <mingo@...nel.org>, Arnd Bergmann <arnd@...db.de>,
	David Woodhouse <dwmw@...zon.co.uk>,
	Dionna Amalie Glaze <dionnaglaze@...gle.com>,
	"H. Peter Anvin" <hpa@...or.com>, Kees Cook <keescook@...omium.org>,
	Kevin Loughlin <kevinloughlin@...gle.com>,
	Len Brown <len.brown@...el.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
	Tom Lendacky <thomas.lendacky@....com>, linux-efi@...r.kernel.org,
	x86@...nel.org, Ard Biesheuvel <ardb@...nel.org>
Subject: 4067196a5227 ("mm/page_alloc: fix deadlock on cpu_hotplug_lock in
 __accept_page()") (was: Re: [tip: x86/boot] x86/boot: Provide
 __pti_set_user_pgtbl() to startup code)

Yah,

leaving the rest below untouched for background info how we ended up here.

So tglx and I did some tracing just now:

...
[    1.156432] smpboot: x86: Booting SMP configuration:
[    1.157267] .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7  #8  #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31
[    1.274198] smp: Brought up 1 node, 32 CPUs
[    1.277842] smpboot: Total of 32 processors activated (191999.87 BogoMIPS)
[    1.289879] ------------[ cut here ]------------
[    1.290682] v: -1

The key is -1 and the code *actually* already says what happens:

                /*
                 * Warn about the '-1' case though; since that means a
                 * decrement is concurrent with a first (0->1) increment. IOW
                 * people are trying to disable something that wasn't yet fully
                 * enabled. This suggests an ordering problem on the user side.
                 */

Keep on reading...

[    1.290688] WARNING: CPU: 0 PID: 10 at kernel/jump_label.c:276 __static_key_slow_dec_cpuslocked+0x8e/0xa0
[    1.293850] Modules linked in:
[    1.294428] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Not tainted 6.15.0-rc5+ #8 PREEMPT(voluntary) 
[    1.296050] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
[    1.297841] Workqueue: events unaccepted_cleanup_work
[    1.298766] RIP: 0010:__static_key_slow_dec_cpuslocked+0x8e/0xa0
[    1.299866] Code: 48 c7 c7 a0 52 54 82 5b e9 8f 42 8c 00 0f 0b 5b e9 f7 ba 8c 00 89 c6 48 c7 c7 19 63 2e 82 c6 05 8c 19 19 01 01 e8 22 54 e5 ff <0f> 0b eb a1 0f 0b eb b9 0f 0b eb b5 66 0f 1f 44 00 00 90 90 90 90
[    1.301840] RSP: 0018:ffffc90000067e30 EFLAGS: 00010282
[    1.302787] RAX: 0000000000000000 RBX: ffffffff82f12d00 RCX: 0000000000000000
[    1.305838] RDX: 0000000000000002 RSI: ffffc90000067cc8 RDI: 00000000ffffffff
[    1.307144] RBP: ffff88827b7d4d98 R08: 00000000fff7ffff R09: ffff88827e3fdfa8
[    1.308449] R10: ffff88827b8035f0 R11: 0000000000000001 R12: ffff888272e2e740
[    1.309838] R13: ffff8881008e1940 R14: ffff888100308a05 R15: ffff888100308a00
[    1.311144] FS:  0000000000000000(0000) GS:ffff8882effbe000(0000) knlGS:0000000000000000
[    1.312613] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.313838] CR2: ffff88827fe00000 CR3: 0008000002c22000 CR4: 00000000003506f0
[    1.315146] Call Trace:
[    1.315608]  <TASK>
[    1.316012]  static_key_slow_dec+0x1f/0x40
[    1.316792]  ? unaccepted_cleanup_work+0x1a/0x50
[    1.317691]  unaccepted_cleanup_work+0x2e/0x50
[    1.317839]  process_one_work+0x171/0x330
[    1.318587]  worker_thread+0x247/0x390
[    1.319282]  ? _raw_spin_lock_irqsave+0x23/0x50
[    1.320136]  ? srso_return_thunk+0x5/0x5f
[    1.320924]  ? __pfx_worker_thread+0x10/0x10
[    1.321838]  kthread+0x107/0x240
[    1.322514]  ? __pfx_kthread+0x10/0x10
[    1.323212]  ? __pfx_kthread+0x10/0x10
[    1.322040] node 0 deferred pages initialised in 40ms
[    1.323867]  ret_from_fork+0x87/0xf0
[    1.323872]  ? __pfx_kthread+0x10/0x10
[    1.326505]  ret_from_fork_asm+0x1a/0x30
[    1.327211]  </TASK>
[    1.327613] Kernel panic - not syncing: kernel: panic_on_warn set ...
[    1.328772] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Not tainted 6.15.0-rc5+ #8 PREEMPT(voluntary) 
[    1.329836] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
[    1.329836] Workqueue: events unaccepted_cleanup_work
[    1.329836] Call Trace:
[    1.329836]  <TASK>
[    1.329836]  dump_stack_lvl+0x56/0x80
[    1.329836]  panic+0x106/0x2c9
[    1.329836]  ? __static_key_slow_dec_cpuslocked+0x8e/0xa0
[    1.329836]  check_panic_on_warn.cold+0xf/0x1e
[    1.329836]  __warn.cold+0x9f/0xf8
[    1.329836]  ? __static_key_slow_dec_cpuslocked+0x8e/0xa0
[    1.329836]  report_bug+0xe6/0x170
[    1.329836]  ? prb_read_valid+0x17/0x20
[    1.329836]  ? __static_key_slow_dec_cpuslocked+0x8e/0xa0
[    1.329836]  handle_bug+0x199/0x260
[    1.329836]  exc_invalid_op+0x13/0x60
[    1.329836]  asm_exc_invalid_op+0x16/0x20
[    1.329836] RIP: 0010:__static_key_slow_dec_cpuslocked+0x8e/0xa0
[    1.329836] Code: 48 c7 c7 a0 52 54 82 5b e9 8f 42 8c 00 0f 0b 5b e9 f7 ba 8c 00 89 c6 48 c7 c7 19 63 2e 82 c6 05 8c 19 19 01 01 e8 22 54 e5 ff <0f> 0b eb a1 0f 0b eb b9 0f 0b eb b5 66 0f 1f 44 00 00 90 90 90 90
[    1.329836] RSP: 0018:ffffc90000067e30 EFLAGS: 00010282
[    1.329836] RAX: 0000000000000000 RBX: ffffffff82f12d00 RCX: 0000000000000000
[    1.329836] RDX: 0000000000000002 RSI: ffffc90000067cc8 RDI: 00000000ffffffff
[    1.329836] RBP: ffff88827b7d4d98 R08: 00000000fff7ffff R09: ffff88827e3fdfa8
[    1.329836] R10: ffff88827b8035f0 R11: 0000000000000001 R12: ffff888272e2e740
[    1.329836] R13: ffff8881008e1940 R14: ffff888100308a05 R15: ffff888100308a00
[    1.329836]  static_key_slow_dec+0x1f/0x40
[    1.329836]  ? unaccepted_cleanup_work+0x1a/0x50
[    1.329836]  unaccepted_cleanup_work+0x2e/0x50
[    1.329836]  process_one_work+0x171/0x330
[    1.329836]  worker_thread+0x247/0x390
[    1.329836]  ? _raw_spin_lock_irqsave+0x23/0x50
[    1.329836]  ? srso_return_thunk+0x5/0x5f
[    1.329836]  ? __pfx_worker_thread+0x10/0x10
[    1.329836]  kthread+0x107/0x240
[    1.329836]  ? __pfx_kthread+0x10/0x10
[    1.329836]  ? __pfx_kthread+0x10/0x10
[    1.329836]  ret_from_fork+0x87/0xf0
[    1.329836]  ? __pfx_kthread+0x10/0x10
[    1.329836]  ret_from_fork_asm+0x1a/0x30
[    1.329836]  </TASK>
[    1.329836] Dumping ftrace buffer:
[    1.329836] ---------------------------------
[    1.329836] pgdatini-180      29..... 1546876us : __free_pages_core: will INC: increment branch key, key: 0

CPU29 comes in and decides to INC the key.

[    1.329836] pgdatini-180      29..... 1546896us : <stack trace>
[    1.329836]  => padata_mt_helper
[    1.329836]  => padata_do_multithreaded
[    1.329836]  => deferred_init_memmap
[    1.329836]  => kthread
[    1.329836]  => ret_from_fork
[    1.329836]  => ret_from_fork_asm
[    1.329836] kthreadd-2         0.N... 1552367us : __accept_page: schedule unaccepted_cleanup_work()
[    1.329836] kthreadd-2         0.N... 1552385us : <stack trace>

CPU0 gets in-between and schedules work...

[    1.329836]  => alloc_pages_bulk_noprof
[    1.329836]  => __vmalloc_node_range_noprof
[    1.329836]  => __vmalloc_node_noprof
[    1.329836]  => copy_process
[    1.329836]  => kernel_clone
[    1.329836]  => kernel_thread
[    1.329836]  => kthreadd
[    1.329836]  => ret_from_fork
[    1.329836]  => ret_from_fork_asm
[    1.329836] kworker/-10        0..... 1552395us : unaccepted_cleanup_work: will DEC: unaccepted_cleanup_work(), key: -1

... and wants to decrement *while* the increment is in progress. Without
holding the jump label mutex.

Pooof.

[    1.329836] kworker/-10        0..... 1552398us : <stack trace>
[    1.329836]  => kthread
[    1.329836]  => ret_from_fork
[    1.329836]  => ret_from_fork_asm
[    1.329836] pgdatini-180      29.N... 1553206us : __free_pages_core: done INC: increment branch key, key: 1

CPU29 gets to increment the key finally.

[    1.329836] pgdatini-180      29.N... 1553213us : <stack trace>
[    1.329836]  => padata_do_multithreaded
[    1.329836]  => deferred_init_memmap
[    1.329836]  => kthread
[    1.329836]  => ret_from_fork
[    1.329836]  => ret_from_fork_asm
[    1.329836] pgdatini-180      29..... 1553227us : __free_pages_core: will INC: increment branch key, key: 1
[    1.329836] pgdatini-180      29..... 1553232us : <stack trace>
[    1.329836]  => padata_mt_helper
[    1.329836]  => padata_do_multithreaded
[    1.329836]  => deferred_init_memmap
[    1.329836]  => kthread
[    1.329836]  => ret_from_fork
[    1.329836]  => ret_from_fork_asm
[    1.329836] pgdatini-180      29..... 1553234us : __free_pages_core: done INC: increment branch key, key: 2
[    1.329836] pgdatini-180      29..... 1553238us : <stack trace>
[    1.329836]  => padata_do_multithreaded
[    1.329836]  => deferred_init_memmap
[    1.329836]  => kthread
[    1.329836]  => ret_from_fork
[    1.329836]  => ret_from_fork_asm
[    1.329836] ---------------------------------
[    1.329836] ---[ end Kernel panic - not syncing: kernel: panic_on_warn set ... ]---

Fun stuff.

On Tue, May 06, 2025 at 09:18:00AM +0200, Borislav Petkov wrote:
> + Kirill.
> 
> This looks like this unaccepted_cleanup_work() thing being unsynchronized...
> 
> On Mon, May 05, 2025 at 06:47:59PM +0200, Borislav Petkov wrote:
> > On Mon, May 05, 2025 at 06:19:49PM +0200, Ard Biesheuvel wrote:
> > > This patch by itself does nothing. The symbol is __weak for the time
> > > being, and given that this code does not have its __pi_ prefixes yet,
> > > this function will be superseded by the existing one. (If you remove
> > > the __weak you will get a linker error)
> > > 
> > > Are you sure this patch is causing the issue?
> > 
> > My by-foot bisection of x86/boot is below.
> > 
> > I don't know if this patch uncovers something or maybe changes placement...
> > 
> > Tom also pointed to
> > 
> > 4067196a5227 ("mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()")
> > 
> > as potentially related.
> > 
> > And now I went and tried to reproduce the warning and it fired ontop of:
> > 
> > 419cbaf6a56a ("x86/boot: Add a bunch of PIC aliases")
> > 
> > which is the previous patch.
> > 
> > Ufff, this is one of those which don't reproduce reliably because it was fine
> > on that commit in the previous run. Nasty.
> > 
> > Lemme go dig more.
> > 
> > ---
> > 
> > ed4d95d033e3 (HEAD, refs/remotes/tip/x86/boot) x86/sev: Disentangle #VC handling code from startup code
> > 
> > <--- NOT
> > 
> > [    1.227968] smp: Brought up 1 node, 32 CPUs
> > [    1.231576] smpboot: Total of 32 processors activated (191999.61 BogoMIPS)
> > [    1.247644] ------------[ cut here ]------------
> > [    1.248697] WARNING: CPU: 17 PID: 104 at kernel/jump_label.c:276 __static_key_slow_dec_cpuslocked+0x2a/0x80
> > [    1.251592] Modules linked in:
> > [    1.252370] CPU: 17 UID: 0 PID: 104 Comm: kworker/17:0 Not tainted 6.15.0-rc4+ #2 PREEMPT(voluntary) 
> > [    1.253173] node 0 deferred pages initialised in 12ms
> > [    1.254539] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
> > [    1.257490] Workqueue: events unaccepted_cleanup_work
> > [    1.258698] RIP: 0010:__static_key_slow_dec_cpuslocked+0x2a/0x80
> > [    1.259574] Code: 0f 1f 44 00 00 53 48 89 fb e8 82 66 e5 ff 8b 03 85 c0 78 16 74 54 83 f8 01 74 11 8d 50 ff f0 0f b1 13 75 ec 5b e9 66 bf 8c 00 <0f> 0b 48 c7 c7 00 44 54 82 e8 a8 5f 8c 00 8b 03 83 f8 ff 74 33 85
> > [    1.266446] Memory: 7574396K/8381588K available (13737K kernel code, 2487K rwdata, 6056K rodata, 3916K init, 3592K bss, 791248K reserved, 0K cma-reserved)
> > [    1.263574] RSP: 0018:ffffc9000048fe40 EFLAGS: 00010286
> > [    1.267573] RAX: 00000000ffffffff RBX: ffffffff82f10d00 RCX: 0000000000000000
> > [    1.269388] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff82f10d00
> > [    1.275578] RBP: ffff88827b7d4d98 R08: 8080808080808080 R09: ffff888100b59100
> > [    1.277441] R10: ffff888100050cc0 R11: fefefefefefefeff R12: ffff88827326e300
> > [    1.279574] R13: ffff888100bc1940 R14: ffff8881000e2405 R15: ffff8881000e2400
> > [    1.281460] FS:  0000000000000000(0000) GS:ffff8882f0400000(0000) knlGS:0000000000000000
> > [    1.281460] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    1.281460] CR2: 0000000000000000 CR3: 0008000002c22000 CR4: 00000000003506f0
> > [    1.287576] Call Trace:
> > [    1.288381]  <TASK>
> > [    1.289015]  static_key_slow_dec+0x1f/0x40
> > [    1.289980]  process_one_work+0x171/0x330
> > [    1.290988]  worker_thread+0x247/0x390
> > [    1.291576]  ? __pfx_worker_thread+0x10/0x10
> > [    1.292773]  kthread+0x107/0x240
> > [    1.293536]  ? __pfx_kthread+0x10/0x10
> > [    1.295573]  ret_from_fork+0x30/0x50
> > [    1.295575]  ? __pfx_kthread+0x10/0x10
> > [    1.296580]  ret_from_fork_asm+0x1a/0x30
> > [    1.297507]  </TASK>
> > [    1.298085] ---[ end trace 0000000000000000 ]---
> > 
> > 5297886f0cc4 x86/boot: Provide __pti_set_user_pgtbl() to startup code
> > 
> > <--- OK
> > 
> > 419cbaf6a56a x86/boot: Add a bunch of PIC aliases
> > f932adcc8650 x86/linkage: Add SYM_PIC_ALIAS() macro helper to emit symbol aliases
> > 
> > <--- OK
> > 
> > ae862964cbc5 x86/sev: Move instruction decoder into separate source file
> > fae89bbfdd9d x86/sev: Make sev_snp_enabled() a static function
> > b3464a36f7f2 x86/boot: Disregard __supported_pte_mask in __startup_64()
> > bd4a58beaaf1 x86/boot: Move early_setup_gdt() back into head64.c
> > 
> > <--- OK
> > 
> > 
> > -- 
> > Regards/Gruss,
> >     Boris.
> > 
> > https://people.kernel.org/tglx/notes-about-netiquette
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ