[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+icZUUNac0fx8hyVtJXS5q+1WMUdGaXvW7x8e8fL-bkekA5rA@mail.gmail.com>
Date: Sun, 6 Sep 2015 19:45:22 +0200
From: Sedat Dilek <sedat.dilek@...il.com>
To: Tejun Heo <tj@...nel.org>, Christoph Lameter <cl@...ux.com>,
Baoquan He <bhe@...hat.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
Denys Vlasenko <dvlasenk@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
David Rientjes <rientjes@...gle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Thomas Graf <tgraf@...g.ch>, Ingo Molnar <mingo@...nel.org>,
"the arch/x86 maintainers" <x86@...nel.org>
Subject: [llvmlinux] percpu | bitmap issue? (Cannot boot on bare metal due to
a kernel NULL pointer dereference)
[ TO percpu folks and CCed some linux/bitmap people listed in [2] ]
Hi,
this week I built a LLVM/Clang v3.7.0 toolchain and wanted to play
with LLVMLinux again.
The kernel (v4.1.6 and v4.2) were compiled successfully.
But I cannot boot on bare metal, so I played a bit with QEMU (KVM) to
see what's going.
The good or bad news:
Both kernel-versions show the same issue (for v4.1.6 I posted to
llvmlinux ML, see [1]).
I am not sure if it is a percpu or bitmap issue.
My QEMU line looks like this...
root# qemu-system-x86_64 -enable-kvm -M pc -kernel $KPATH/bzImage
-initrd $KPATH/initrd.img -m 512 -net none -serial stdio -append
"root=/dev/ram0 console=ttyS0 hung_task_panic=1
earlyprintk=ttyS0,115200"
( I have attached the below output also as a text-file - Gmail /LKML
might truncate it badly. )
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Initializing cgroup subsys cpuacct
[ 0.000000] Linux version 4.2.0-1-llvmlinux-small
(sedat.dilek@...il.com@...box) (clang version 3.7.0
(tags/RELEASE_370/final)) #1 SMP Sun Sep 6 18:51:10 CEST 2015
[ 0.000000] Command line: root=/dev/ram0 console=ttyS0
hung_task_panic=1 earlyprintk=ttyS0,115200
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Centaur CentaurHauls
[ 0.000000] x86/fpu: Legacy x87 FPU detected.
[ 0.000000] x86/fpu: Using 'lazy' FPU context switches.
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009f3ff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009f400-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000001fffcfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000001fffd000-0x000000001fffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ 0.000000] bootconsole [earlyser0] enabled
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] SMBIOS 2.4 present.
[ 0.000000] Hypervisor detected: KVM
[ 0.000000] e820: last_pfn = 0x1fffd max_arch_pfn = 0x400000000
[ 0.000000] x86/PAT: PAT not supported by CPU.
[ 0.000000] found SMP MP-table at [mem 0x000fdb00-0x000fdb0f]
mapped at [ffff8800000fdb00]
[ 0.000000] Scanning 1 areas for low memory corruption
[ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[ 0.000000] init_memory_mapping: [mem 0x1f200000-0x1f3fffff]
[ 0.000000] init_memory_mapping: [mem 0x00100000-0x1f1fffff]
[ 0.000000] init_memory_mapping: [mem 0x1f400000-0x1fffcfff]
[ 0.000000] RAMDISK: [mem 0x1f46e000-0x1ffeefff]
[ 0.000000] ACPI: Early table checksum verification disabled
[ 0.000000] ACPI: RSDP 0x00000000000FD9A0 000014 (v00 BOCHS )
[ 0.000000] ACPI: RSDT 0x000000001FFFD7B0 000034 (v01 BOCHS
BXPCRSDT 00000001 BXPC 00000001)
[ 0.000000] ACPI: FACP 0x000000001FFFFF80 000074 (v01 BOCHS
BXPCFACP 00000001 BXPC 00000001)
[ 0.000000] ACPI: DSDT 0x000000001FFFD9B0 002589 (v01 BXPC BXDSDT
00000001 INTL 20100528)
[ 0.000000] ACPI: FACS 0x000000001FFFFF40 000040
[ 0.000000] ACPI: SSDT 0x000000001FFFD910 00009E (v01 BOCHS
BXPCSSDT 00000001 BXPC 00000001)
[ 0.000000] ACPI: APIC 0x000000001FFFD830 000072 (v01 BOCHS
BXPCAPIC 00000001 BXPC 00000001)
[ 0.000000] ACPI: HPET 0x000000001FFFD7F0 000038 (v01 BOCHS
BXPCHPET 00000001 BXPC 00000001)
[ 0.000000] No NUMA configuration found
[ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000001fffcfff]
[ 0.000000] NODE_DATA(0) allocated [mem 0x1fff8000-0x1fffcfff]
[ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[ 0.000000] kvm-clock: cpu 0, msr 0:1fff4001, primary cpu clock
[ 0.000000] clocksource: kvm-clock: mask: 0xffffffffffffffff
max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.000000] DMA32 [mem 0x0000000001000000-0x000000001fffcfff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.000000] node 0: [mem 0x0000000000100000-0x000000001fffcfff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000001fffcfff]
[ 0.000000] ACPI: PM-Timer IO Port: 0xb008
[ 0.000000] IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[ 0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[ 0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[ 0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[ 0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000effff]
[ 0.000000] PM: Registered nosave memory: [mem 0x000f0000-0x000fffff]
[ 0.000000] e820: [mem 0x20000000-0xfeffbfff] available for PCI devices
[ 0.000000] Booting paravirtualized kernel on KVM
[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff
max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[ 0.000000] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256
nr_cpu_ids:1 nr_node_ids:1
[ 0.000000] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000009
[ 0.000000] IP: [<ffffffff814a43a0>] __bitmap_weight+0x20/0x60
[ 0.000000] PGD 0
[ 0.000000] Oops: 0000 [#1] SMP
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
4.2.0-1-llvmlinux-small #1
[ 0.000000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 0.000000] task: ffffffff81e16500 ti: ffffffff81e00000 task.ti:
ffffffff81e00000
[ 0.000000] RIP: 0010:[<ffffffff814a43a0>] [<ffffffff814a43a0>]
__bitmap_weight+0x20/0x60
[ 0.000000] RSP: 0000:ffffffff81e03e10 EFLAGS: 00010006
[ 0.000000] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000009
[ 0.000000] RDX: 0000000000000001 RSI: 0000000000000100 RDI: 0000000000000001
[ 0.000000] RBP: ffffffff81e03e88 R08: ffffffff81f1b830 R09: 0000000000000004
[ 0.000000] R10: 0000000000000003 R11: 0000000000000001 R12: 000000007fffffff
[ 0.000000] R13: 0000000000000008 R14: 0000000000000001 R15: 0000000000000007
[ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81f27000(0000)
knlGS:0000000000000000
[ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.000000] CR2: 0000000000000009 CR3: 0000000001e0f000 CR4: 00000000000006b0
[ 0.000000] Stack:
[ 0.000000] ffffffff81f82f5c ffffffff81e03ea8 0000000000007f78
0000000000002000
[ 0.000000] 0000000000200000 0000000000000000 0000000000200000
0000004000000001
[ 0.000000] 0000000000000000 0000000000000001 0000000000002000
0000000000007000
[ 0.000000] Call Trace:
[ 0.000000] [<ffffffff81f82f5c>] ? pcpu_build_alloc_info+0x31c/0x6e0
[ 0.000000] [<ffffffff81f5f030>] ? pcpu_cpu_distance+0x80/0x80
[ 0.000000] [<ffffffff81f5efb0>] ? setup_per_cpu_areas+0x290/0x290
[ 0.000000] [<ffffffff81f8282a>] pcpu_embed_first_chunk+0x3a/0x450
[ 0.000000] [<ffffffff81106d55>] ? printk+0x55/0x60
[ 0.000000] [<ffffffff81f5f100>] ? pcpu_fc_alloc+0xd0/0xd0
[ 0.000000] [<ffffffff81f5ed8d>] setup_per_cpu_areas+0x6d/0x290
[ 0.000000] [<ffffffff81f41197>] start_kernel+0x197/0x660
[ 0.000000] [<ffffffff81911237>] ? memblock_reserve+0x57/0x60
[ 0.000000] [<ffffffff81f408db>] x86_64_start_kernel+0x26b/0x280
[ 0.000000] Code: b8 01 00 00 00 c3 0f 1f 44 00 00 49 89 f8 31 d2
41 89 f1 41 c1 e9 06 b8 00 00 00 00 74 26 31 d2 45 89 ca 4c 89 c1 0f
1f 44 00 00 <48> 8b 39 e8 68 cd 00 00 89 d2 48 01 c2 48 83 c1 08 41 ff
ca 75
[ 0.000000] RIP [<ffffffff814a43a0>] __bitmap_weight+0x20/0x60
[ 0.000000] RSP <ffffffff81e03e10>
[ 0.000000] CR2: 0000000000000009
[ 0.000000] ---[ end trace e6cd9fdd68e48fbf ]---
[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.000000] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000009
[ 0.000000] IP: [<ffffffff814a43a0>] __bitmap_weight+0x20/0x60
[ 0.000000] PGD 0
[ 0.000000] Oops: 0000 [#2] SMP
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G D
4.2.0-1-llvmlinux-small #1
[ 0.000000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 0.000000] task: ffffffff81e16500 ti: ffffffff81e00000 task.ti:
ffffffff81e00000
[ 0.000000] RIP: 0010:[<ffffffff814a43a0>] [<ffffffff814a43a0>]
__bitmap_weight+0x20/0x60
[ 0.000000] RSP: 0000:ffffffff81e03960 EFLAGS: 00010006
[ 0.000000] RAX: 0000000000000001 RBX: ffffffff81c4e0cb RCX: 0000000000000009
[ 0.000000] RDX: 0000000000000001 RSI: 0000000000000100 RDI: 0000000000000001
[ 0.000000] RBP: ffffffff81e03988 R08: ffffffff81f1b850 R09: 0000000000000004
[ 0.000000] R10: 0000000000000003 R11: 203a676e69636e79 R12: ffffffff81e16500
[ 0.000000] R13: 0000000000000000 R14: ffffffff81f1b850 R15: 0000000000000000
[ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81f27000(0000)
knlGS:0000000000000000
[ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.000000] CR2: 0000000000000009 CR3: 0000000001e0f000 CR4: 00000000000006b0
[ 0.000000] Stack:
[ 0.000000] ffffffff8105944f 0000000000000000 ffffffff81c4e0cb
ffffffff81e16500
[ 0.000000] 0000000000000009 ffffffff81e03a08 ffffffff8108efee
ffffffff81e03a28
[ 0.000000] 0000000000000000 ffffffff81e16500 0000000000000002
0000000000000000
[ 0.000000] Call Trace:
[ 0.000000] [<ffffffff8105944f>] ? native_stop_other_cpus+0x3f/0x210
[ 0.000000] [<ffffffff8108efee>] panic+0x10e/0x280
[ 0.000000] [<ffffffff81091e79>] do_exit+0xaa9/0xae0
[ 0.000000] [<ffffffff8110b0d1>] ? rcu_lock_release+0x21/0x30
[ 0.000000] [<ffffffff8110a741>] ? kmsg_dump+0x181/0x190
[ 0.000000] [<ffffffff8101bb18>] oops_end+0xc8/0xf0
[ 0.000000] [<ffffffff81078400>] no_context+0x410/0x460
[ 0.000000] [<ffffffff81186d6a>] ? is_ftrace_trampoline+0x5a/0x80
[ 0.000000] [<ffffffff810b8306>] ? __kernel_text_address+0x76/0xb0
[ 0.000000] [<ffffffff8149c83d>] ? vsnprintf+0x8d/0x510
[ 0.000000] [<ffffffff810786c3>] __bad_area_nosemaphore+0x53/0x2a0
[ 0.000000] [<ffffffff8149c83d>] ? vsnprintf+0x8d/0x510
[ 0.000000] [<ffffffff81070531>] ?
__raw_callee_save___native_queued_spin_unlock+0x11/0x20
[ 0.000000] [<ffffffff81077e6b>] bad_area_nosemaphore+0x2b/0x40
[ 0.000000] [<ffffffff8107799b>] __do_page_fault+0x59b/0x5d0
[ 0.000000] [<ffffffff81070531>] ?
__raw_callee_save___native_queued_spin_unlock+0x11/0x20
[ 0.000000] [<ffffffff81077a42>] do_page_fault+0x72/0xc0
[ 0.000000] [<ffffffff8191b208>] page_fault+0x28/0x30
[ 0.000000] [<ffffffff814a43a0>] ? __bitmap_weight+0x20/0x60
[ 0.000000] [<ffffffff81f82f5c>] ? pcpu_build_alloc_info+0x31c/0x6e0
[ 0.000000] [<ffffffff81f5f030>] ? pcpu_cpu_distance+0x80/0x80
[ 0.000000] [<ffffffff81f5efb0>] ? setup_per_cpu_areas+0x290/0x290
[ 0.000000] [<ffffffff81f8282a>] pcpu_embed_first_chunk+0x3a/0x450
[ 0.000000] [<ffffffff81106d55>] ? printk+0x55/0x60
[ 0.000000] [<ffffffff81f5f100>] ? pcpu_fc_alloc+0xd0/0xd0
[ 0.000000] [<ffffffff81f5ed8d>] setup_per_cpu_areas+0x6d/0x290
[ 0.000000] [<ffffffff81f41197>] start_kernel+0x197/0x660
[ 0.000000] [<ffffffff81911237>] ? memblock_reserve+0x57/0x60
[ 0.000000] [<ffffffff81f408db>] x86_64_start_kernel+0x26b/0x280
[ 0.000000] Code: b8 01 00 00 00 c3 0f 1f 44 00 00 49 89 f8 31 d2
41 89 f1 41 c1 e9 06 b8 00 00 00 00 74 26 31 d2 45 89 ca 4c 89 c1 0f
1f 44 00 00 <48> 8b 39 e8 68 cd 00 00 89 d2 48 01 c2 48 83 c1 08 41 ff
ca 75
[ 0.000000] RIP [<ffffffff814a43a0>] __bitmap_weight+0x20/0x60
[ 0.000000] RSP <ffffffff81e03960>
[ 0.000000] CR2: 0000000000000009
[ 0.000000] ---[ end trace e6cd9fdd68e48fc0 ]---
For me this looks like a PERCPU issue, but I might be wrong.
I also added folks listed in [2].
I will play a bit more with some "for-4.3" percpu and linux/bitmap
fixes [2] and [3] which are not in Linux v4.2 (which is now my base
for further tests).
Attached are my kernel-config and something like a dmesg-log (QEMU snippet).
I can send my important files if someone wants to play with QEMU.
$ LC_ALL=C ll important-files/
total 40896
drwxr-xr-x 2 wearefam wearefam 4096 Sep 6 18:56 ./
drwxr-xr-x 3 wearefam wearefam 4096 Sep 6 18:55 ../
-rw-r--r-- 1 wearefam wearefam 3512984 Sep 6 18:51 System.map
-rw-r--r-- 1 wearefam wearefam 4631200 Sep 6 18:51 bzImage
-rw-r--r-- 1 wearefam wearefam 12062720 Sep 6 18:54 initrd.img
-rwxr-xr-x 1 wearefam wearefam 27501165 Sep 6 18:51 vmlinux*
If you need more informations, please let me know.
Any help appreciated.
Thanks.
Regards,
- Sedat -
[1] http://lists.linuxfoundation.org/pipermail/llvmlinux/2015-September/001337.html
[2] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=1a1d48a4a8fde49aedc045d894efe67173d59fe0
[3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=292c24a073ee34c629966eec8b48d54b0a206667
View attachment "PERCPU_LLVMLINUX.txt" of type "text/plain" (11931 bytes)
Download attachment "config-4.2.0-1-llvmlinux-small" of type "application/octet-stream" (128613 bytes)
Powered by blists - more mailing lists