[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <58b4f89a-63c2-b8bd-4414-fbc312c52697@redhat.com>
Date:   Mon, 4 Dec 2017 07:28:02 -0500
From:   Prarit Bhargava <prarit@...hat.com>
To:     Jakub Kicinski <kubakici@...pl>,
        LKML <linux-kernel@...r.kernel.org>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [bisected] x86 boot still broken on -rc2
On 12/03/2017 08:28 PM, Jakub Kicinski wrote:
> Same thing on rc2, bisected down to:
> 
> commit b4c0a7326f5dc0ef7a64128b0ae7d081f4b2cbd1 (refs/bisect/bad)
> Author: Prarit Bhargava <prarit@...hat.com>
> Date:   Tue Nov 14 07:42:57 2017 -0500
> 
>     x86/smpboot: Fix __max_logical_packages estimate
>     
>     A system booted with a small number of cores enabled per package
>     panics because the estimate of __max_logical_packages is too low.
>     
>     This occurs when the total number of active cores across all packages is
>     less than the maximum core count for a single package. e.g.:
>     
>       On a 4 package system with 20 cores/package where only 4 cores are
>       enabled on each package, the value of __max_logical_packages is
>       calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4.
>     
>     Calculate __max_logical_packages after the cpu enumeration has completed.
>     Use the boot cpu's data to extrapolate the number of packages.
>     
>     Signed-off-by: Prarit Bhargava <prarit@...hat.com>
>     Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
>     Cc: Tom Lendacky <thomas.lendacky@....com>
>     Cc: Andi Kleen <ak@...ux.intel.com>
>     Cc: Christian Borntraeger <borntraeger@...ibm.com>
>     Cc: Peter Zijlstra <peterz@...radead.org>
>     Cc: Kan Liang <kan.liang@...el.com>
>     Cc: He Chen <he.chen@...ux.intel.com>
>     Cc: Stephane Eranian <eranian@...gle.com>
>     Cc: Dave Hansen <dave.hansen@...el.com>
>     Cc: Piotr Luc <piotr.luc@...el.com>
>     Cc: Andy Lutomirski <luto@...nel.org>
>     Cc: Arvind Yadav <arvind.yadav.cs@...il.com>
>     Cc: Vitaly Kuznetsov <vkuznets@...hat.com>
>     Cc: Borislav Petkov <bp@...e.de>
>     Cc: Tim Chen <tim.c.chen@...ux.intel.com>
>     Cc: Mathias Krause <minipli@...glemail.com>
>     Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
>     Link: https://lkml.kernel.org/r/20171114124257.22013-4-prarit@redhat.com
> 
> 
> On Fri, 1 Dec 2017 16:39:54 -0800, Jakub Kicinski wrote:
>> Hi!
>>
>> I'm hitting these after DaveM pulled rc1 into net-next on my Xeon
>> E5-2630 v4 box.  It also happens on linux-next.  Did anyone else
>> experience it?  (.config attached)
>>
>> [    5.003771] WARNING: CPU: 14 PID: 1 at ../arch/x86/events/intel/uncore.c:936 uncore_pci_probe+0x285/0x2b0
>> [    5.007544] Modules linked in:
>> [    5.007544] CPU: 14 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
>> [    5.007544] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
I have a Dell R730 available for use.  OOC are you booting with the default
BIOS options?
P.
>> [    5.007544] task: 000000009e842725 task.stack: 000000008a63fd2d
>> [    5.007544] RIP: 0010:uncore_pci_probe+0x285/0x2b0
>> [    5.007544] RSP: 0000:ffffad8580163d10 EFLAGS: 00010286
>> [    5.007544] RAX: ffff98576cc3df30 RBX: ffffffffb08037e0 RCX: ffffffffb0c1a120
>> [    5.007544] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffb0c1a960
>> [    5.007544] RBP: ffff985b6c00ac00 R08: fffffffffffffffe R09: 00000000000fffff
>> [    5.007544] R10: ffff98576f1b6018 R11: 0000000000000022 R12: ffff985b6c641000
>> [    5.007544] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000001
>> [    5.007544] FS:  0000000000000000(0000) GS:ffff98576fb80000(0000) knlGS:0000000000000000
>> [    5.007544] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    5.007544] CR2: 0000000000000000 CR3: 0000000185c09001 CR4: 00000000003606e0
>> [    5.007544] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [    5.007544] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [    5.007544] Call Trace:
>> [    5.007544]  local_pci_probe+0x3d/0x90
>> [    5.007544]  ? pci_match_device+0xd9/0x100
>> [    5.007544]  pci_device_probe+0x122/0x180
>> [    5.007544]  driver_probe_device+0x246/0x330
>> [    5.007544]  ? set_debug_rodata+0x11/0x11
>> [    5.007544]  __driver_attach+0x8a/0x90
>> [    5.007544]  ? driver_probe_device+0x330/0x330
>> [    5.007544]  bus_for_each_dev+0x5c/0x90
>> [    5.007544]  bus_add_driver+0x196/0x220
>> [    5.007544]  driver_register+0x57/0xc0
>> [    5.007544]  intel_uncore_init+0x1e3/0x249
>> [    5.007544]  ? uncore_type_init+0x193/0x193
>> [    5.007544]  ? set_debug_rodata+0x11/0x11
>> [    5.007544]  do_one_initcall+0x4b/0x190
>> [    5.007544]  kernel_init_freeable+0x16e/0x1f5
>> [    5.007544]  ? rest_init+0xd0/0xd0
>> [    5.007544]  kernel_init+0xa/0x100
>> [    5.007544]  ret_from_fork+0x1f/0x30
>> [    5.007544] Code: 48 8b 52 08 48 85 d2 74 0d 89 44 24 04 48 89 df ff d2 8b 44 24 04 48 89 df 89 44 24 04 e8 54 0a 1c 00 8b 44 24 0 
>> [    5.007544] ---[ end trace 4dc4c3d5f5afcd2f ]---
>> [    5.244504] bdx_uncore: probe of 0000:ff:08.2 failed with error -22
>> [    5.251604] bdx_uncore: probe of 0000:ff:0b.1 failed with error -22
>> [    5.258711] bdx_uncore: probe of 0000:ff:10.1 failed with error -22
>> [    5.265819] bdx_uncore: probe of 0000:ff:14.0 failed with error -22
>> [    5.272919] bdx_uncore: probe of 0000:ff:14.1 failed with error -22
>> [    5.280019] bdx_uncore: probe of 0000:ff:15.0 failed with error -22
>> [    5.287112] bdx_uncore: probe of 0000:ff:15.1 failed with error -22
>> [    5.294376] WARNING: CPU: 1 PID: 15 at ../arch/x86/events/intel/uncore.c:1065 uncore_change_type_ctx.isra.5+0xe6/0xf0
>> [    5.298362] Modules linked in:
>> [    5.298362] CPU: 1 PID: 15 Comm: cpuhp/1 Tainted: G        W        4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
>> [    5.298362] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>> [    5.298362] task: 00000000ae78bc8f task.stack: 00000000f79660c1
>> [    5.298362] RIP: 0010:uncore_change_type_ctx.isra.5+0xe6/0xf0
>> [    5.298362] RSP: 0000:ffffad85833b3db8 EFLAGS: 00010213
>> [    5.298362] RAX: 0000000000000000 RBX: ffff9857669b0200 RCX: 0000000000000001
>> [    5.298362] RDX: ffff985b6f000000 RSI: ffff985b66580400 RDI: ffffffffb0c1ae8c
>> [    5.298362] RBP: ffff985b66580400 R08: ffffffffb0c1ae8c R09: 0000000000000001
>> [    5.298362] R10: 0000000000000000 R11: 00000000003d0900 R12: 0000000000000000
>> [    5.298362] R13: ffffffffffffffff R14: 0000000000000001 R15: 0000000000000008
>> [    5.298362] FS:  0000000000000000(0000) GS:ffff985b6f000000(0000) knlGS:0000000000000000
>> [    5.298362] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    5.298362] CR2: 0000000000000000 CR3: 0000000185c09001 CR4: 00000000003606e0
>> [    5.298362] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [    5.298362] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [    5.298362] Call Trace:
>> [    5.298362]  uncore_event_cpu_online+0x283/0x340
>> [    5.298362]  ? uncore_event_cpu_offline+0x180/0x180
>> [    5.298362]  cpuhp_invoke_callback+0x8c/0x620
>> [    5.298362]  ? __schedule+0x1ad/0x6c0
>> [    5.298362]  ? sort_range+0x20/0x20
>> [    5.298362]  cpuhp_thread_fun+0xbc/0x140
>> [    5.298362]  smpboot_thread_fn+0x114/0x1d0
>> [    5.298362]  kthread+0x111/0x130
>> [    5.298362]  ? kthread_create_on_node+0x40/0x40
>> [    5.298362]  ret_from_fork+0x1f/0x30
>> [    5.298362] Code: 2a 44 89 73 10 41 83 c4 01 48 81 c5 40 01 00 00 45 3b 20 7c cf 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f f 
>> [    5.298362] ---[ end trace 4dc4c3d5f5afcd30 ]---
>> [    5.504808] Scanning for low memory corruption every 60 seconds
>> [    5.512347] Initialise system trusted keyrings
>> [    5.517470] workingset: timestamp_bits=40 max_order=23 bucket_order=0
>> [    5.524840] BUG: unable to handle kernel paging request at 0000000023314bf4
>> [    5.528761] IP: __kmalloc_track_caller+0xa8/0x210
>> [    5.528761] PGD 185c0a067 P4D 185c0a067 PUD 185c0c067 PMD 0 
>> [    5.528761] Oops: 0000 [#1] PREEMPT SMP
>> [    5.528761] Modules linked in:
>> [    5.528761] CPU: 14 PID: 1 Comm: swapper/0 Tainted: G        W        4.15.0-rc1-perf-00225-gb2a4e0a76b1d #782
>> [    5.528761] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
>> [    5.528761] task: 000000009e842725 task.stack: 000000008a63fd2d
>> [    5.528761] RIP: 0010:__kmalloc_track_caller+0xa8/0x210
>> [    5.528761] RSP: 0000:ffffad8580163d58 EFLAGS: 00010286
>> [    5.528761] RAX: 0000000000000000 RBX: ffffffffffffffff RCX: 000000000012ce0e
>> [    5.528761] RDX: 000000000012cd0e RSI: 000000000012cd0e RDI: 000000000001dde0
>> [    5.528761] RBP: ffff985700000001 R08: ffff98576f407c00 R09: ffffffffb071edbf
>> [    5.528761] R10: ffffd54de1995600 R11: ffff985b6655915f R12: 0000000000000004
>> [    5.528761] R13: 00000000014000c0 R14: ffffffffb026c239 R15: ffff98576f407c00
>> [    5.528761] FS:  0000000000000000(0000) GS:ffff98576fb80000(0000) knlGS:0000000000000000
>> [    5.528761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    5.528761] CR2: ffffffffffffffff CR3: 0000000185c09001 CR4: 00000000003606e0
>> [    5.528761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [    5.528761] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [    5.528761] Call Trace:
>> [    5.528761]  kstrdup+0x2d/0x60
>> [    5.528761]  __kernfs_new_node+0x29/0x130
>> [    5.528761]  kernfs_new_node+0x24/0x50
>> [    5.528761]  kernfs_create_link+0x29/0x90
>> [    5.528761]  sysfs_do_create_link_sd.isra.0+0x5d/0xc0
>> [    5.528761]  sysfs_slab_add+0x1f5/0x270
>> [    5.528761]  ? set_debug_rodata+0x11/0x11
>> [    5.528761]  slab_sysfs_init+0x8b/0xfa
>> [    5.528761]  ? kmem_cache_init+0xf9/0xf9
>> [    5.528761]  do_one_initcall+0x4b/0x190
>> [    5.528761]  kernel_init_freeable+0x16e/0x1f5
>> [    5.528761]  ? rest_init+0xd0/0xd0
>> [    5.528761]  kernel_init+0xa/0x100
>> [    5.528761]  ret_from_fork+0x1f/0x30
>> [    5.528761] Code: 49 63 47 20 49 8b 3f 48 8d 8a 00 01 00 00 48 8b 5c 05 00 48 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 ab 48 85 db 7 
>> [    5.528761] RIP: __kmalloc_track_caller+0xa8/0x210 RSP: ffffad8580163d58
>> [    5.528761] CR2: ffffffffffffffff
>> [    5.528761] ---[ end trace 4dc4c3d5f5afcd31 ]---
>> [    5.773089] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
>> [    5.773089] 
>> [    5.777076] Kernel Offset: 0x2f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [    5.777076] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> 
Powered by blists - more mailing lists
 
