[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e20d88d0-5fb9-4307-be67-88b04ae9a188@roeck-us.net>
Date: Fri, 15 Mar 2024 09:17:14 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
Linus Torvalds <torvalds@...uxfoundation.org>,
Uros Bizjak <ubizjak@...il.com>, linux-sparse@...r.kernel.org,
lkp@...el.com, oe-kbuild-all@...ts.linux.dev
Subject: Re: [patch 5/9] x86: Cure per CPU madness on UP
Hi,
On Mon, Mar 04, 2024 at 11:12:23AM +0100, Thomas Gleixner wrote:
> On UP builds sparse complains rightfully about accesses to cpu_info with
> per CPU accessors:
>
> cacheinfo.c:282:30: sparse: warning: incorrect type in initializer (different address spaces)
> cacheinfo.c:282:30: sparse: expected void const [noderef] __percpu *__vpp_verify
> cacheinfo.c:282:30: sparse: got unsigned int *
>
> The reason is that on UP builds cpu_info which is a per CPU variable on SMP
> is mapped to boot_cpu_info which is a regular variable. There is a hideous
> accessor cpu_data() which tries to hide this, but it's not sufficient as
> some places require raw accessors and generates worse code than the regular
> per CPU accessors.
>
> Waste sizeof(struct x86_cpuinfo) memory on UP and provide the per CPU
> cpu_info unconditionally. This requires to update the CPU info on the boot
> CPU as SMP does. (Ab)use the weakly defined smp_prepare_boot_cpu() function
> and implement exactly that.
>
> This allows to use regular per CPU accessors uncoditionally and paves the
> way to remove the cpu_data() hackery.
>
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
This patch results in crashes when running the mainline kernel in qemu
with nosmp builds and Intel CPUs. The problem is _not_ seen on tag
x86-cleanups-2024-03-11; it is only seen in the mainline kernel. I didn't
check all of them, but it looks like AMD CPUs are not affected. The
initial bisect points to the merge of x86-cleanups-2024-03-11 into the
mainline kernel. I rebased x86-cleanups-2024-03-11 on top of the mainline
kernel; the second bisect uses that rebase as base. Reverting this patch
from the mainline kernel fixes the problem.
I don't know the code well enough to determine what is wrong.
Please let me know what I can do to help debugging the problem.
Thanks,
Guenter
----
crash log:
[ 3.291087] BUG: unable to handle page fault for address: ffff9cd801f3f2a0
[ 3.291087] #PF: supervisor write access in kernel mode
[ 3.291087] #PF: error_code(0x0002) - not-present page
[ 3.291087] PGD 60201067 P4D 60201067 PUD 0
[ 3.291087] Oops: 0002 [#1] PREEMPT PTI
[ 3.291087] CPU: 0 PID: 1 Comm: swapper Not tainted 6.8.0-06619-ge5e038b7ae9d #1
[ 3.291087] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 3.291087] RIP: 0010:rapl_cpu_online+0xf2/0x110
[ 3.291087] Code: 05 ff 8e 07 03 40 42 0f 00 48 89 43 60 e8 56 5f 12 00 8b 15 b4 84 61 02 48 8b 05 01 8f 07 03 48 c7 83 90 00 00 00 e0 84 80 b6 <48> 89 9c d0 38 01 00 00 e9 2b ff ff ff b8 f4 ff ff ff e9 47 ff ff
[ 3.291087] RSP: 0018:ffffa3d54001fdd0 EFLAGS: 00000246
[ 3.291087] RAX: ffff9cd001f3f200 RBX: ffff9cd001fb34a8 RCX: 0000000000000000
[ 3.291087] RDX: 00000000ffffffed RSI: 0000000000000001 RDI: ffff9cd001fb3550
[ 3.291087] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
[ 3.291087] R10: 0000000000000001 R11: 0000000000018001 R12: 0000000000000000
[ 3.291087] R13: 000000000000009e R14: ffffffffb6808180 R15: ffffffffb86710e5
[ 3.291087] FS: 0000000000000000(0000) GS:ffffffffb8ab0000(0000) knlGS:0000000000000000
[ 3.291087] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.291087] CR2: ffff9cd801f3f2a0 CR3: 000000005e6a2000 CR4: 00000000001506f0
[ 3.291087] Call Trace:
[ 3.291087] <TASK>
[ 3.291087] ? __die+0x1f/0x60
[ 3.291087] ? page_fault_oops+0x148/0x460
[ 3.291087] ? search_exception_tables+0x37/0x50
[ 3.291087] ? fixup_exception+0x21/0x320
[ 3.291087] ? exc_page_fault+0xca/0x150
[ 3.291087] ? asm_exc_page_fault+0x26/0x30
[ 3.291087] ? __pfx_rapl_cpu_online+0x10/0x10
[ 3.291087] ? rapl_cpu_online+0xf2/0x110
[ 3.291087] cpuhp_invoke_callback.constprop.0+0x117/0x3e0
[ 3.291087] __cpuhp_setup_state_cpuslocked+0x1b7/0x280
[ 3.291087] ? __pfx_rapl_cpu_online+0x10/0x10
[ 3.291087] rapl_pmu_init+0x189/0x2e0
[ 3.291087] ? __pfx_rapl_pmu_init+0x10/0x10
[ 3.291087] do_one_initcall+0x4f/0x210
[ 3.291087] kernel_init_freeable+0x166/0x290
[ 3.291087] ? __pfx_kernel_init+0x10/0x10
[ 3.291087] kernel_init+0x15/0x1b0
[ 3.291087] ret_from_fork+0x2f/0x50
[ 3.291087] ? __pfx_kernel_init+0x10/0x10
[ 3.291087] ret_from_fork_asm+0x19/0x30
[ 3.291087] </TASK>
[ 3.291087] Modules linked in:
[ 3.291087] CR2: ffff9cd801f3f2a0
[ 3.291087] ---[ end trace 0000000000000000 ]---
[ 3.291087] RIP: 0010:rapl_cpu_online+0xf2/0x110
[ 3.291087] Code: 05 ff 8e 07 03 40 42 0f 00 48 89 43 60 e8 56 5f 12 00 8b 15 b4 84 61 02 48 8b 05 01 8f 07 03 48 c7 83 90 00 00 00 e0 84 80 b6 <48> 89 9c d0 38 01 00 00 e9 2b ff ff ff b8 f4 ff ff ff e9 47 ff ff
[ 3.291087] RSP: 0018:ffffa3d54001fdd0 EFLAGS: 00000246
[ 3.291087] RAX: ffff9cd001f3f200 RBX: ffff9cd001fb34a8 RCX: 0000000000000000
[ 3.291087] RDX: 00000000ffffffed RSI: 0000000000000001 RDI: ffff9cd001fb3550
[ 3.291087] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
[ 3.291087] R10: 0000000000000001 R11: 0000000000018001 R12: 0000000000000000
[ 3.291087] R13: 000000000000009e R14: ffffffffb6808180 R15: ffffffffb86710e5
[ 3.291087] FS: 0000000000000000(0000) GS:ffffffffb8ab0000(0000) knlGS:0000000000000000
[ 3.291087] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.291087] CR2: ffff9cd801f3f2a0 CR3: 000000005e6a2000 CR4: 00000000001506f0
[ 3.291087] note: swapper[1] exited with irqs disabled
[ 3.306047] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
---
1st bisect:
# bad: [1bbeaf83dd7b5e3628b98bec66ff8fe2646e14aa] Merge tag 'perf-tools-for-v6.9-2024-03-13' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
# good: [e8f897f4afef0031fe618a8e94127a0934896aba] Linux 6.8
git bisect start 'HEAD' 'v6.8'
# bad: [9187210eee7d87eea37b45ea93454a88681894a4] Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect bad 9187210eee7d87eea37b45ea93454a88681894a4
# bad: [a01c9fe32378636ae65bec8047b5de3fdb2ba5c8] Merge tag 'nfsd-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
git bisect bad a01c9fe32378636ae65bec8047b5de3fdb2ba5c8
# bad: [691632f0e86973604678d193ccfa47b9614581aa] Merge tag 's390-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
git bisect bad 691632f0e86973604678d193ccfa47b9614581aa
# good: [8ede842f669b6f78812349bbef4d1efd0fbdafce] Merge tag 'rust-6.9' of https://github.com/Rust-for-Linux/linux
git bisect good 8ede842f669b6f78812349bbef4d1efd0fbdafce
# good: [bfdb395a7cde12d83a623949ed029b0ab38d765b] Merge tag 'x86_mtrr_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good bfdb395a7cde12d83a623949ed029b0ab38d765b
# bad: [685d98211273f60e38a6d361b62d7016c545297e] Merge tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 685d98211273f60e38a6d361b62d7016c545297e
# good: [b0402403e54ae9eb94ce1cbb53c7def776e97426] Merge tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
git bisect good b0402403e54ae9eb94ce1cbb53c7def776e97426
# good: [cb81deefb59de01325ab822f900c13941bfaf67f] x86/idle: Sanitize X86_BUG_AMD_E400 handling
git bisect good cb81deefb59de01325ab822f900c13941bfaf67f
# good: [73f0d1d7b4abb4a46bae1a0d8caf66e23d1138d0] Merge tag 'x86-asm-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 73f0d1d7b4abb4a46bae1a0d8caf66e23d1138d0
# good: [65efc4dc12c5cc296374278673b89390eba79fe6] x86/cpu: Provide a declaration for itlb_multihit_kvm_mitigation
git bisect good 65efc4dc12c5cc296374278673b89390eba79fe6
# good: [d69ad12c786f0a4593c48c0658043aa4a5116b09] Merge tag 'x86-build-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good d69ad12c786f0a4593c48c0658043aa4a5116b09
# good: [35ce64922c8263448e58a2b9e8d15a64e11e9b2d] x86/idle: Select idle routine only once
git bisect good 35ce64922c8263448e58a2b9e8d15a64e11e9b2d
# good: [774a86f1c885460ade4334b901919fa1d8ae6ec6] x86/nmi: Drop unused declaration of proc_nmi_enabled()
git bisect good 774a86f1c885460ade4334b901919fa1d8ae6ec6
# bad: [fcc196579aa1fc167d6778948bff69fae6116737] Merge tag 'x86-cleanups-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad fcc196579aa1fc167d6778948bff69fae6116737
# first bad commit: [fcc196579aa1fc167d6778948bff69fae6116737] Merge tag 'x86-cleanups-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
---
2nd bisect:
# bad: [6d70929c7019e265425f7a89cf72163a639d462e] x86/nmi: Drop unused declaration of proc_nmi_enabled()
# good: [d69ad12c786f0a4593c48c0658043aa4a5116b09] Merge tag 'x86-build-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect start 'HEAD' 'fcc196579aa1fc167d6778948bff69fae6116737~1'
# good: [5c157d25918ef15454c2a9a9262f7b763d9c9add] x86/msr: Add missing __percpu annotations
git bisect good 5c157d25918ef15454c2a9a9262f7b763d9c9add
# bad: [f5a6b1e2d92d825a0f73ca11e960795da336d99e] x86/uaccess: Add missing __force to casts in __access_ok() and valid_user_address()
git bisect bad f5a6b1e2d92d825a0f73ca11e960795da336d99e
# bad: [68907233f6d26a214bb79f892db8d999b15dda97] x86/percpu: Cure per CPU madness on UP
git bisect bad 68907233f6d26a214bb79f892db8d999b15dda97
# good: [053df18ceb928c8631042317a884b2842a457f1b] smp: Consolidate smp_prepare_boot_cpu()
git bisect good 053df18ceb928c8631042317a884b2842a457f1b
# first bad commit: [68907233f6d26a214bb79f892db8d999b15dda97] x86/percpu: Cure per CPU madness on UP
Powered by blists - more mailing lists