[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250506092445.GBaBnVXXyvnazly6iF@fat_crate.local>
Date: Tue, 6 May 2025 11:24:45 +0200
From: Borislav Petkov <bp@...en8.de>
To: "Kirill A. Shutemov" <kirill@...temov.name>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-kernel@...r.kernel.org, linux-tip-commits@...r.kernel.org,
Ingo Molnar <mingo@...nel.org>, Arnd Bergmann <arnd@...db.de>,
David Woodhouse <dwmw@...zon.co.uk>,
Dionna Amalie Glaze <dionnaglaze@...gle.com>,
"H. Peter Anvin" <hpa@...or.com>, Kees Cook <keescook@...omium.org>,
Kevin Loughlin <kevinloughlin@...gle.com>,
Len Brown <len.brown@...el.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
Tom Lendacky <thomas.lendacky@....com>, linux-efi@...r.kernel.org,
x86@...nel.org, Ard Biesheuvel <ardb@...nel.org>
Subject: 4067196a5227 ("mm/page_alloc: fix deadlock on cpu_hotplug_lock in
__accept_page()") (was: Re: [tip: x86/boot] x86/boot: Provide
__pti_set_user_pgtbl() to startup code)
Yah,
leaving the rest below untouched for background info how we ended up here.
So tglx and I did some tracing just now:
...
[ 1.156432] smpboot: x86: Booting SMP configuration:
[ 1.157267] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31
[ 1.274198] smp: Brought up 1 node, 32 CPUs
[ 1.277842] smpboot: Total of 32 processors activated (191999.87 BogoMIPS)
[ 1.289879] ------------[ cut here ]------------
[ 1.290682] v: -1
The key is -1 and the code *actually* already says what happens:
/*
* Warn about the '-1' case though; since that means a
* decrement is concurrent with a first (0->1) increment. IOW
* people are trying to disable something that wasn't yet fully
* enabled. This suggests an ordering problem on the user side.
*/
Keep on reading...
[ 1.290688] WARNING: CPU: 0 PID: 10 at kernel/jump_label.c:276 __static_key_slow_dec_cpuslocked+0x8e/0xa0
[ 1.293850] Modules linked in:
[ 1.294428] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Not tainted 6.15.0-rc5+ #8 PREEMPT(voluntary)
[ 1.296050] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
[ 1.297841] Workqueue: events unaccepted_cleanup_work
[ 1.298766] RIP: 0010:__static_key_slow_dec_cpuslocked+0x8e/0xa0
[ 1.299866] Code: 48 c7 c7 a0 52 54 82 5b e9 8f 42 8c 00 0f 0b 5b e9 f7 ba 8c 00 89 c6 48 c7 c7 19 63 2e 82 c6 05 8c 19 19 01 01 e8 22 54 e5 ff <0f> 0b eb a1 0f 0b eb b9 0f 0b eb b5 66 0f 1f 44 00 00 90 90 90 90
[ 1.301840] RSP: 0018:ffffc90000067e30 EFLAGS: 00010282
[ 1.302787] RAX: 0000000000000000 RBX: ffffffff82f12d00 RCX: 0000000000000000
[ 1.305838] RDX: 0000000000000002 RSI: ffffc90000067cc8 RDI: 00000000ffffffff
[ 1.307144] RBP: ffff88827b7d4d98 R08: 00000000fff7ffff R09: ffff88827e3fdfa8
[ 1.308449] R10: ffff88827b8035f0 R11: 0000000000000001 R12: ffff888272e2e740
[ 1.309838] R13: ffff8881008e1940 R14: ffff888100308a05 R15: ffff888100308a00
[ 1.311144] FS: 0000000000000000(0000) GS:ffff8882effbe000(0000) knlGS:0000000000000000
[ 1.312613] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.313838] CR2: ffff88827fe00000 CR3: 0008000002c22000 CR4: 00000000003506f0
[ 1.315146] Call Trace:
[ 1.315608] <TASK>
[ 1.316012] static_key_slow_dec+0x1f/0x40
[ 1.316792] ? unaccepted_cleanup_work+0x1a/0x50
[ 1.317691] unaccepted_cleanup_work+0x2e/0x50
[ 1.317839] process_one_work+0x171/0x330
[ 1.318587] worker_thread+0x247/0x390
[ 1.319282] ? _raw_spin_lock_irqsave+0x23/0x50
[ 1.320136] ? srso_return_thunk+0x5/0x5f
[ 1.320924] ? __pfx_worker_thread+0x10/0x10
[ 1.321838] kthread+0x107/0x240
[ 1.322514] ? __pfx_kthread+0x10/0x10
[ 1.323212] ? __pfx_kthread+0x10/0x10
[ 1.322040] node 0 deferred pages initialised in 40ms
[ 1.323867] ret_from_fork+0x87/0xf0
[ 1.323872] ? __pfx_kthread+0x10/0x10
[ 1.326505] ret_from_fork_asm+0x1a/0x30
[ 1.327211] </TASK>
[ 1.327613] Kernel panic - not syncing: kernel: panic_on_warn set ...
[ 1.328772] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Not tainted 6.15.0-rc5+ #8 PREEMPT(voluntary)
[ 1.329836] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
[ 1.329836] Workqueue: events unaccepted_cleanup_work
[ 1.329836] Call Trace:
[ 1.329836] <TASK>
[ 1.329836] dump_stack_lvl+0x56/0x80
[ 1.329836] panic+0x106/0x2c9
[ 1.329836] ? __static_key_slow_dec_cpuslocked+0x8e/0xa0
[ 1.329836] check_panic_on_warn.cold+0xf/0x1e
[ 1.329836] __warn.cold+0x9f/0xf8
[ 1.329836] ? __static_key_slow_dec_cpuslocked+0x8e/0xa0
[ 1.329836] report_bug+0xe6/0x170
[ 1.329836] ? prb_read_valid+0x17/0x20
[ 1.329836] ? __static_key_slow_dec_cpuslocked+0x8e/0xa0
[ 1.329836] handle_bug+0x199/0x260
[ 1.329836] exc_invalid_op+0x13/0x60
[ 1.329836] asm_exc_invalid_op+0x16/0x20
[ 1.329836] RIP: 0010:__static_key_slow_dec_cpuslocked+0x8e/0xa0
[ 1.329836] Code: 48 c7 c7 a0 52 54 82 5b e9 8f 42 8c 00 0f 0b 5b e9 f7 ba 8c 00 89 c6 48 c7 c7 19 63 2e 82 c6 05 8c 19 19 01 01 e8 22 54 e5 ff <0f> 0b eb a1 0f 0b eb b9 0f 0b eb b5 66 0f 1f 44 00 00 90 90 90 90
[ 1.329836] RSP: 0018:ffffc90000067e30 EFLAGS: 00010282
[ 1.329836] RAX: 0000000000000000 RBX: ffffffff82f12d00 RCX: 0000000000000000
[ 1.329836] RDX: 0000000000000002 RSI: ffffc90000067cc8 RDI: 00000000ffffffff
[ 1.329836] RBP: ffff88827b7d4d98 R08: 00000000fff7ffff R09: ffff88827e3fdfa8
[ 1.329836] R10: ffff88827b8035f0 R11: 0000000000000001 R12: ffff888272e2e740
[ 1.329836] R13: ffff8881008e1940 R14: ffff888100308a05 R15: ffff888100308a00
[ 1.329836] static_key_slow_dec+0x1f/0x40
[ 1.329836] ? unaccepted_cleanup_work+0x1a/0x50
[ 1.329836] unaccepted_cleanup_work+0x2e/0x50
[ 1.329836] process_one_work+0x171/0x330
[ 1.329836] worker_thread+0x247/0x390
[ 1.329836] ? _raw_spin_lock_irqsave+0x23/0x50
[ 1.329836] ? srso_return_thunk+0x5/0x5f
[ 1.329836] ? __pfx_worker_thread+0x10/0x10
[ 1.329836] kthread+0x107/0x240
[ 1.329836] ? __pfx_kthread+0x10/0x10
[ 1.329836] ? __pfx_kthread+0x10/0x10
[ 1.329836] ret_from_fork+0x87/0xf0
[ 1.329836] ? __pfx_kthread+0x10/0x10
[ 1.329836] ret_from_fork_asm+0x1a/0x30
[ 1.329836] </TASK>
[ 1.329836] Dumping ftrace buffer:
[ 1.329836] ---------------------------------
[ 1.329836] pgdatini-180 29..... 1546876us : __free_pages_core: will INC: increment branch key, key: 0
CPU29 comes in and decides to INC the key.
[ 1.329836] pgdatini-180 29..... 1546896us : <stack trace>
[ 1.329836] => padata_mt_helper
[ 1.329836] => padata_do_multithreaded
[ 1.329836] => deferred_init_memmap
[ 1.329836] => kthread
[ 1.329836] => ret_from_fork
[ 1.329836] => ret_from_fork_asm
[ 1.329836] kthreadd-2 0.N... 1552367us : __accept_page: schedule unaccepted_cleanup_work()
[ 1.329836] kthreadd-2 0.N... 1552385us : <stack trace>
CPU0 gets in-between and schedules work...
[ 1.329836] => alloc_pages_bulk_noprof
[ 1.329836] => __vmalloc_node_range_noprof
[ 1.329836] => __vmalloc_node_noprof
[ 1.329836] => copy_process
[ 1.329836] => kernel_clone
[ 1.329836] => kernel_thread
[ 1.329836] => kthreadd
[ 1.329836] => ret_from_fork
[ 1.329836] => ret_from_fork_asm
[ 1.329836] kworker/-10 0..... 1552395us : unaccepted_cleanup_work: will DEC: unaccepted_cleanup_work(), key: -1
... and wants to decrement *while* the increment is in progress. Without
holding the jump label mutex.
Pooof.
[ 1.329836] kworker/-10 0..... 1552398us : <stack trace>
[ 1.329836] => kthread
[ 1.329836] => ret_from_fork
[ 1.329836] => ret_from_fork_asm
[ 1.329836] pgdatini-180 29.N... 1553206us : __free_pages_core: done INC: increment branch key, key: 1
CPU29 gets to increment the key finally.
[ 1.329836] pgdatini-180 29.N... 1553213us : <stack trace>
[ 1.329836] => padata_do_multithreaded
[ 1.329836] => deferred_init_memmap
[ 1.329836] => kthread
[ 1.329836] => ret_from_fork
[ 1.329836] => ret_from_fork_asm
[ 1.329836] pgdatini-180 29..... 1553227us : __free_pages_core: will INC: increment branch key, key: 1
[ 1.329836] pgdatini-180 29..... 1553232us : <stack trace>
[ 1.329836] => padata_mt_helper
[ 1.329836] => padata_do_multithreaded
[ 1.329836] => deferred_init_memmap
[ 1.329836] => kthread
[ 1.329836] => ret_from_fork
[ 1.329836] => ret_from_fork_asm
[ 1.329836] pgdatini-180 29..... 1553234us : __free_pages_core: done INC: increment branch key, key: 2
[ 1.329836] pgdatini-180 29..... 1553238us : <stack trace>
[ 1.329836] => padata_do_multithreaded
[ 1.329836] => deferred_init_memmap
[ 1.329836] => kthread
[ 1.329836] => ret_from_fork
[ 1.329836] => ret_from_fork_asm
[ 1.329836] ---------------------------------
[ 1.329836] ---[ end Kernel panic - not syncing: kernel: panic_on_warn set ... ]---
Fun stuff.
On Tue, May 06, 2025 at 09:18:00AM +0200, Borislav Petkov wrote:
> + Kirill.
>
> This looks like this unaccepted_cleanup_work() thing being unsynchronized...
>
> On Mon, May 05, 2025 at 06:47:59PM +0200, Borislav Petkov wrote:
> > On Mon, May 05, 2025 at 06:19:49PM +0200, Ard Biesheuvel wrote:
> > > This patch by itself does nothing. The symbol is __weak for the time
> > > being, and given that this code does not have its __pi_ prefixes yet,
> > > this function will be superseded by the existing one. (If you remove
> > > the __weak you will get a linker error)
> > >
> > > Are you sure this patch is causing the issue?
> >
> > My by-foot bisection of x86/boot is below.
> >
> > I don't know if this patch uncovers something or maybe changes placement...
> >
> > Tom also pointed to
> >
> > 4067196a5227 ("mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()")
> >
> > as potentially related.
> >
> > And now I went and tried to reproduce the warning and it fired ontop of:
> >
> > 419cbaf6a56a ("x86/boot: Add a bunch of PIC aliases")
> >
> > which is the previous patch.
> >
> > Ufff, this is one of those which don't reproduce reliably because it was fine
> > on that commit in the previous run. Nasty.
> >
> > Lemme go dig more.
> >
> > ---
> >
> > ed4d95d033e3 (HEAD, refs/remotes/tip/x86/boot) x86/sev: Disentangle #VC handling code from startup code
> >
> > <--- NOT
> >
> > [ 1.227968] smp: Brought up 1 node, 32 CPUs
> > [ 1.231576] smpboot: Total of 32 processors activated (191999.61 BogoMIPS)
> > [ 1.247644] ------------[ cut here ]------------
> > [ 1.248697] WARNING: CPU: 17 PID: 104 at kernel/jump_label.c:276 __static_key_slow_dec_cpuslocked+0x2a/0x80
> > [ 1.251592] Modules linked in:
> > [ 1.252370] CPU: 17 UID: 0 PID: 104 Comm: kworker/17:0 Not tainted 6.15.0-rc4+ #2 PREEMPT(voluntary)
> > [ 1.253173] node 0 deferred pages initialised in 12ms
> > [ 1.254539] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
> > [ 1.257490] Workqueue: events unaccepted_cleanup_work
> > [ 1.258698] RIP: 0010:__static_key_slow_dec_cpuslocked+0x2a/0x80
> > [ 1.259574] Code: 0f 1f 44 00 00 53 48 89 fb e8 82 66 e5 ff 8b 03 85 c0 78 16 74 54 83 f8 01 74 11 8d 50 ff f0 0f b1 13 75 ec 5b e9 66 bf 8c 00 <0f> 0b 48 c7 c7 00 44 54 82 e8 a8 5f 8c 00 8b 03 83 f8 ff 74 33 85
> > [ 1.266446] Memory: 7574396K/8381588K available (13737K kernel code, 2487K rwdata, 6056K rodata, 3916K init, 3592K bss, 791248K reserved, 0K cma-reserved)
> > [ 1.263574] RSP: 0018:ffffc9000048fe40 EFLAGS: 00010286
> > [ 1.267573] RAX: 00000000ffffffff RBX: ffffffff82f10d00 RCX: 0000000000000000
> > [ 1.269388] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff82f10d00
> > [ 1.275578] RBP: ffff88827b7d4d98 R08: 8080808080808080 R09: ffff888100b59100
> > [ 1.277441] R10: ffff888100050cc0 R11: fefefefefefefeff R12: ffff88827326e300
> > [ 1.279574] R13: ffff888100bc1940 R14: ffff8881000e2405 R15: ffff8881000e2400
> > [ 1.281460] FS: 0000000000000000(0000) GS:ffff8882f0400000(0000) knlGS:0000000000000000
> > [ 1.281460] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 1.281460] CR2: 0000000000000000 CR3: 0008000002c22000 CR4: 00000000003506f0
> > [ 1.287576] Call Trace:
> > [ 1.288381] <TASK>
> > [ 1.289015] static_key_slow_dec+0x1f/0x40
> > [ 1.289980] process_one_work+0x171/0x330
> > [ 1.290988] worker_thread+0x247/0x390
> > [ 1.291576] ? __pfx_worker_thread+0x10/0x10
> > [ 1.292773] kthread+0x107/0x240
> > [ 1.293536] ? __pfx_kthread+0x10/0x10
> > [ 1.295573] ret_from_fork+0x30/0x50
> > [ 1.295575] ? __pfx_kthread+0x10/0x10
> > [ 1.296580] ret_from_fork_asm+0x1a/0x30
> > [ 1.297507] </TASK>
> > [ 1.298085] ---[ end trace 0000000000000000 ]---
> >
> > 5297886f0cc4 x86/boot: Provide __pti_set_user_pgtbl() to startup code
> >
> > <--- OK
> >
> > 419cbaf6a56a x86/boot: Add a bunch of PIC aliases
> > f932adcc8650 x86/linkage: Add SYM_PIC_ALIAS() macro helper to emit symbol aliases
> >
> > <--- OK
> >
> > ae862964cbc5 x86/sev: Move instruction decoder into separate source file
> > fae89bbfdd9d x86/sev: Make sev_snp_enabled() a static function
> > b3464a36f7f2 x86/boot: Disregard __supported_pte_mask in __startup_64()
> > bd4a58beaaf1 x86/boot: Move early_setup_gdt() back into head64.c
> >
> > <--- OK
> >
> >
> > --
> > Regards/Gruss,
> > Boris.
> >
> > https://people.kernel.org/tglx/notes-about-netiquette
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists