linux-kernel - Re: Current Git: BUG: unable to handle kernel paging request at 0000000001a40ca0

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 24 Jul 2008 08:51:51 +0200
From:	Dieter Ries <clip2@....de>
To:	Vegard Nossum <vegard.nossum@...il.com>
CC:	linux-kernel@...r.kernel.org, jgarzik@...ox.com,
	netdev@...r.kernel.org, Pekka Enberg <penberg@...helsinki.fi>,
	jeffrey.t.kirsher@...el.com, e1000-devel@...ts.sourceforge.net
Subject: Re: Current Git: BUG: unable to handle kernel paging request at 0000000001a40ca0

Vegard Nossum schrieb:
> On Wed, Jul 23, 2008 at 11:53 PM, Dieter Ries <clip3@....de> wrote:
>>>> Dieter: If this is reproducible, it would probably help quite a bit to
>>>> configure the kernel with CONFIG_SLUB_DEBUG and boot with
>>>> slub_debug=FZPUT (unless you already have CONFIG_SLUB_DEBUG_ON set, in
>>>> which case you are already running with the SLUB debugging at boot).
>>>> It might catch the corruption before it becomes fatal, or give us some
>>>> more clues anyway.
>> I tried to bisect the bug, which failed because there were too many kernels
>> not booting with other problems, I guess bisecting just fails in the merge
>> window.
>>
>> With CONFIG_SLUB_DEBUG_ON the output looks different, unfortunately
>> netconsole stops before those are transmitted.

I think I managed to catch one of those:


general protection fault: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.26-06373-gcaf076e #49
RIP: 0010:[<ffffffff805e08f9>]  [<ffffffff805e08f9>] 
nf_nat_move_storage+0x21/0x7a
RSP: 0018:ffffffff8091ab80  EFLAGS: 00010206
RAX: ffffffff805e08d8 RBX: ffff88007d1fb948 RCX: 000000000000006b
RDX: ffff88007d175e10 RSI: ffff88007d175e7b RDI: ffff88007d1fb948
RBP: ffffffff8091aba0 R08: 0000000000000000 R09: ffff88007d175e90
R10: ffffe20000000008 R11: ffff88007d175e10 R12: 59d2c3ffff88007d
R13: ffff88007d175e7b R14: 00000000000000a0 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffffffff8089ee80(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff808b0000, task ffffffff80842340)
Stack:  0000000000000002 ffff88007d3d2000 ffff88007d1fb948 0000000000000070
  ffffffff8091abf0 ffffffff8059d3c4 ffffffff8091ac40 0000000100000001
  ffffffff809e3658 ffff88007d3d2000 0000000000000002 ffff88007f9f6500
Call Trace:
  <IRQ>  [<ffffffff8059d3c4>] __nf_ct_ext_add+0x15f/0x1f7
  [<ffffffff805e762c>] nf_nat_fn+0x84/0x152
  [<ffffffff805e77d8>] nf_nat_in+0x2f/0x71
  [<ffffffff805953d8>] nf_iterate+0x48/0x85
  [<ffffffff805b19c0>] ? ip_rcv_finish+0x0/0x35d
  [<ffffffff80595478>] nf_hook_slow+0x63/0xcb
  [<ffffffff805b19c0>] ? ip_rcv_finish+0x0/0x35d
  [<ffffffff8028fe7c>] ? __slab_alloc+0x413/0x4bd
  [<ffffffff805b21b8>] ip_rcv+0x257/0x297
  [<ffffffff80581461>] netif_receive_skb+0x1f1/0x263
  [<ffffffff80495b34>] e1000_receive_skb+0x46/0x5d
  [<ffffffff8049830b>] e1000_clean_rx_irq+0x20e/0x2a6
  [<ffffffff8024cce8>] ? getnstimeofday+0x3f/0xa0
  [<ffffffff804952ce>] e1000_clean+0x6d/0x218
  [<ffffffff8024ad39>] ? hrtimer_get_next_event+0xa8/0xb8
  [<ffffffff80583569>] net_rx_action+0xa9/0x17c
  [<ffffffff80239b51>] __do_softirq+0x65/0xd5
  [<ffffffff8020c5dc>] call_softirq+0x1c/0x28
  [<ffffffff8020dd0a>] do_softirq+0x39/0x77
  [<ffffffff80239aab>] irq_exit+0x44/0x85
  [<ffffffff8020dff5>] do_IRQ+0x147/0x16a
  [<ffffffff8020b8a1>] ret_from_intr+0x0/0xa
  <EOI>  [<ffffffff80446d94>] ? acpi_idle_enter_bm+0x2a7/0x317
  [<ffffffff80446d8a>] ? acpi_idle_enter_bm+0x29d/0x317
  [<ffffffff805672cd>] ? menu_select+0x75/0x9e
  [<ffffffff8056660e>] ? cpuidle_idle_call+0x75/0xa7
  [<ffffffff80209fd6>] ? cpu_idle+0x69/0x8c
  [<ffffffff8064d9ed>] ? rest_init+0x61/0x63
  [<ffffffff808bcd9c>] ? start_kernel+0x2ad/0x2b9
  [<ffffffff808bc275>] ? x86_64_start_reservations+0x84/0x88
  [<ffffffff808bc385>] ? x86_64_start_kernel+0xe4/0xeb


Code: ff 5b 41 5c 41 5d 41 5e c9 c3 55 48 89 e5 41 55 41 54 53 48 83 ec 
08 e8 c6 a8 c2 ff 4c 8b 66 20 48 89 fb 49 89 f5 4d 85 e4 74 51 <49> f7 
44 24 78 80 01 00 00 74 46 48 c7 c7 78 6a 9e 80 e8 8f 2e
RIP  [<ffffffff805e08f9>] nf_nat_move_storage+0x21/0x7a
  RSP <ffffffff8091ab80>
---[ end trace 6f6148e13aab302e ]---
Kernel panic - not syncing: Aiee, killing interrupt handler!

>>
>> As there are always some lines about e1000 in the backtraces, I tried to
>> boot without LAN cable connected, and it worked, and crashed afterwards when
>> I plugged the cable in, with a bug in net/core/dev.c.
>>
>> Should I copy the messages with CONFIG_SLUB_DEBUG_ON by hand, or are just
>> some parts important?
> 
> There were some e1000 patches in flight on LKML recently; you might be
> able to find them and see if it helps you. It also seems that some
> changes were just committed to -git, so I guess you should try the
> very latest from there.

I reverted some of the last patches concerning e1000 one by one, but the 
last ~12 which I did revert yet didnt solve the problem.

> 
> You also Cced netdev from the start, so somebody from there should be
> able to help you more from here than I. :-)
> 
> 
> Vegard
> 
cu
Dieter

-- 
3rd Law of Computing:
         Anything that can go wr
fortune: Segmentation violation -- Core dumped
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/