[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171109051905.pdlsyrbzrwlsjbrs@wfg-t540p.sh.intel.com>
Date: Thu, 9 Nov 2017 13:19:05 +0800
From: Fengguang Wu <fengguang.wu@...el.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Network Development <netdev@...r.kernel.org>,
Linux Wireless List <linux-wireless@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [run_timer_softirq] BUG: unable to handle kernel paging request
at 0000000000010007
On Mon, Oct 30, 2017 at 12:29:47PM -0700, Linus Torvalds wrote:
>On Sun, Oct 29, 2017 at 4:48 PM, Fengguang Wu <fengguang.wu@...el.com> wrote:
>>
>> Here are 3 dmesgs related to wireless and 1 from ethernet.
>
>Fengguang, these would be lovelier still _if_ you have DEBUG_INFO
>enabled on the kernel, and your script were to find things like
>"symbol+0xhex/0xhex", and run "./scripts/faddr2line" on them.
>
>So
>
>> [ 235.425464] BUG: unable to handle kernel paging request at 0000000000010007
>> [ 235.425470] IP: run_timer_softirq+0x13a/0x470
>
>would also then have
>
> run_timer_softirq at timer.c:XYZ
>
>which would make it easier to see exactly _what_ it is that faults. As
>it is, I think there's a fair number of inlining that makes it hard to
>see the cause, but that faddrtoline would make very obvious.
>
>Finding that "symbol+xyz/abc" pattern should be fairly easy to
>automate, and should fit the 0day model fairly well. No?
>
>In this case, the trapping instruction ends up decoding to
>
> 0: 4c 8d 6c c5 90 lea -0x70(%rbp,%rax,8),%r13
> 5: 49 8b 45 00 mov 0x0(%r13),%rax
> 9: 48 85 c0 test %rax,%rax
> c: 74 de je 0xffffffffffffffec
> e: 4d 8b 7d 00 mov 0x0(%r13),%r15
> 12: 4d 89 7c 24 08 mov %r15,0x8(%r12)
> 17: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> 1c: 49 8b 07 mov (%r15),%rax
> 1f: 49 8b 57 08 mov 0x8(%r15),%rdx
> 23: 48 85 c0 test %rax,%rax
> 26: 48 89 02 mov %rax,(%rdx)
> 29: 74 04 je 0x2f
> 2b:* 48 89 50 08 mov %rdx,0x8(%rax) <-- trapping instruction
> 2f: 41 f6 47 2a 20 testb $0x20,0x2a(%r15)
> 34: 49 c7 47 08 00 00 00 movq $0x0,0x8(%r15)
>
>and %rax has the value 0xffff, so yes, it will trap at 0x10007.
>
>It's not trivial to see just *wjhat* access it is.
>
>I *think* that "testb $32" is checking for TIMER_IRQSAFE in
>expire_timers(), and that the oops is due to the list operations in
>detach_timer() (inlined).
>
>Which doesn't really help: it looks like the timer lists are corrupt.
>With some luck, some register state could have the timer function
>pointer in it, and we'd get a hint of *which* timer this is, but that
>doesn't look to be the case here either.
>
>I'm not seeing anything to really help debug this here.
Yes it's accessing the list. Here is the faddr2line output.
[ 190.983445] PASS fils_sk_hlp_no_rapid_commit 0.549105 2017-11-09 12:41:50.114825
[ 190.983447]
[ 190.986376] passed all 1 test case(s)
[ 190.986377]
[ 191.006705] ignored_by_lkp fils_sk_hlp_oom
[ 191.006708]
[ 191.007635] general protection fault: 0000 [#1] SMP
[ 191.007636] Modules linked in: cmac ccm arc4 mac80211_hwsim mac80211 cfg80211 rfkill rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver sr_mod cdrom sg ata_generic pata_acpi sb_edac crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc ppdev aesni_intel crypto_simd glue_helper cryptd snd_pcm snd_timer snd soundcore pcspkr serio_raw ata_piix i2c_piix4 libata floppy parport_pc parport ip_tables
[ 191.007656] CPU: 1 PID: 186 Comm: sed Not tainted 4.14.0-rc8-00013-g87df261 #1
[ 191.007657] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 191.007659] task: ffff88007fa58000 task.stack: ffffc900008b0000
[ 191.007663] RIP: 0010:run_timer_softirq+0x13a/0x470:
run_timer_softirq at include/linux/list.h:651
[ 191.007664] RSP: 0018:ffff88013fd03ec8 EFLAGS: 00010086
[ 191.007665] RAX: dead000000000200 RBX: 0000000000000001 RCX: ffff8800710bbb80
[ 191.007665] RDX: ffff88013fd03ed0 RSI: ffff88013fd122e8 RDI: ffff88013fd03ed8
[ 191.007666] RBP: ffff88013fd03f40 R08: ffff88013fd127f8 R09: ffff88013fd03ed8
[ 191.007667] R10: ffff88013fd12330 R11: 000000000000008f R12: ffff88013fd122c0
[ 191.007668] R13: ffff88013fd03ed0 R14: dead000000000200 R15: ffff8800710bbb80
[ 191.007669] FS: 00007fabab3b9800(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
[ 191.007670] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 191.007674] CR2: 00007fc7ea4895d0 CR3: 000000007fa57002 CR4: 00000000001606e0
[ 191.007678] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 191.007679] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 191.007680] Call Trace:
[ 191.007681] <IRQ>
[ 191.007685] ? kvm_clock_get_cycles+0x25/0x30:
kvm_clock_get_cycles at arch/x86/include/asm/preempt.h:81
[ 191.007687] ? ktime_get+0x40/0xa0:
ktime_get at kernel/time/timekeeping_internal.h:19
[ 191.007690] __do_softirq+0x104/0x2ab
[ 191.007693] irq_exit+0xf1/0x100:
irq_exit at kernel/softirq.c:364
[ 191.007695] smp_apic_timer_interrupt+0x69/0x140:
smp_apic_timer_interrupt at arch/x86/include/asm/irq_regs.h:27
[ 191.007696] apic_timer_interrupt+0x9d/0xb0
[ 191.007697] </IRQ>
[ 191.007701] RIP: 0010:console_unlock+0x1f0/0x4f0:
console_unlock at arch/x86/include/asm/paravirt.h:778
[ 191.007702] RSP: 0018:ffffc900008b3d18 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff10
[ 191.007703] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff
[ 191.007703] RDX: ffffc900008b3cd0 RSI: 0000000000000004 RDI: 0000000000000247
[ 191.007704] RBP: ffffc900008b3d50 R08: 0000000000000000 R09: 0000000000000000
[ 191.007705] R10: ffff8800000be400 R11: 0000000082354601 R12: 000000000000002e
[ 191.007705] R13: ffffffff820755a0 R14: 0000000000000000 R15: ffffffff823546a0
[ 191.007708] vprintk_emit+0x302/0x3b0:
vprintk_emit at kernel/printk/printk.c:1760
[ 191.007710] printk_emit+0x33/0x3b:
printk_emit at kernel/printk/printk.c:1782
[ 191.007713] ? __kmalloc+0x195/0x210:
__kmalloc at mm/slab.h:421
[ 191.007715] devkmsg_write+0xf6/0x150:
devkmsg_write at kernel/printk/printk.c:815
[ 191.007718] __vfs_write+0xd3/0x150:
__vfs_write at fs/read_write.c:470
[ 191.007719] vfs_write+0xc7/0x1b0:
vfs_write at fs/read_write.c:545
[ 191.007721] SyS_write+0x46/0xa0:
SyS_write at fs/read_write.c:590
[ 191.007722] entry_SYSCALL_64_fastpath+0x1a/0xa5:
entry_SYSCALL_64_fastpath at arch/x86/entry/entry_64.S:206
[ 191.007723] RIP: 0033:0x7fabaaab6720
[ 191.007724] RSP: 002b:00007fff07d55008 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 191.007725] RAX: ffffffffffffffda RBX: 0000000000000106 RCX: 00007fabaaab6720
[ 191.007725] RDX: 0000000000000001 RSI: 0000000000611598 RDI: 0000000000000001
[ 191.007726] RBP: 000000000000022c R08: 00007fabaad74760 R09: 00007fabaad73600
[ 191.007727] R10: 000000000000001e R11: 0000000000000246 R12: 00000000021ada10
[ 191.007727] R13: 0000000000fd4940 R14: 00007fff07d54f30 R15: 000000000000022c
[ 191.007729] Code: 4c 8d 6c c5 90 49 8b 45 00 48 85 c0 74 de 4d 8b 7d 00 4d 89 7c 24 08 0f 1f 44 00 00 49 8b 07 49 8b 57 08 48 85 c0 48 89 02 74 04 <48> 89 50 08 41 f6 47 2a 20 49 c7 47 08 00 00 00 00 4c 89 e7 4d
[ 191.007748] RIP: run_timer_softirq+0x13a/0x470:
run_timer_softirq at include/linux/list.h:651 RSP: ffff88013fd03ec8
[ 191.007751] ---[ end trace bb3719ef16688f4f ]---
[ 191.007752] Kernel panic - not syncing: Fatal exception in interrupt
[ 191.007773] Kernel Offset: disabled
Regards,
Fengguang
View attachment "dmesg-vm-lkp-hsw01-4G-2:20171109124159:x86_64-rhel-7.2+CONFIG_DEBUG_INFO_REDUCED:4.14.0-rc8-00013-g87df261:1" of type "text/plain" (2086423 bytes)
View attachment ".config" of type "text/plain" (163325 bytes)
Powered by blists - more mailing lists