lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 29 Jul 2011 15:18:52 +0200
From:	synapse <synapse@...py.csoma.elte.hu>
To:	netdev@...r.kernel.org
Subject: PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check)

Hello guys,

I have a problem that I hope you can help me resolv. This is my first 
real bug report, so please be
patient :)

### Description:
3.0.0-rc4 routinely locks up with BUG: unable to handle kernel NULL 
pointer dereference at 000000000000002c
I have an intel sr2600 machine with a 10Gbit interface, it periodically 
locks up after a few days.
It serves a lot of traffic. The trace is at the end of the mail.
###

### My efforts:
I've traced the error back from atomic_dec_and_test() to:

ipv4_dst_check()
check_peer_redir()
neigh_release()
atomic_dec_and_test()

The parameter to atomic_dec_and_test() is NULL (&neigh->refcnt in 
neigh_release), so atomic_dec_and_test()
at /arch/x86/include/asm/atomic.h dies at offset 0xffffffff8140f56f.

ffffffff8140f560:       48 8b 15 19 47 2f 00    mov    
0x2f4719(%rip),%rdx        # 0xffffffff81703c80
ffffffff8140f567:       48 89 50 18             mov    %rdx,0x18(%rax)
ffffffff8140f56b:       48 8b 7b 40             mov    0x40(%rbx),%rdi
ffffffff8140f56f:       f0 ff 4f 2c             lock decl 0x2c(%rdi)
ffffffff8140f573:       0f 94 c0                sete   %al
ffffffff8140f576:       84 c0                   test   %al,%al
ffffffff8140f578:       0f 85 ab 00 00 00       jne    0xffffffff8140f629

 From what I've seen is that this code is responsible for pmtu related 
things. The refcount member of struct neighbour
is NULL and the neigh pointer (struct neighbour *) in neigh_release() is 
not. I have no clue how this might happen,
though I suspect somebody releases the data structure somehow. Note that 
this code is invoked when redirect_learned.a4
is set and is different from rt_gateway in ipv4_dst_check().

Is it possible that two packets go to two different cores for processing 
and one core invalidates the rt entry
the other is currently working on (meaning the second will try to 
dereference a NULL ptr)?
###


This is just my clumsy attempt at tracking this down, I'm not a kernel 
expert unfortunately. I'm happy to provide
further info on the matter. If I'm completely on the wrong track please 
let me know.

Thank you for any help,
Gergely Kalman


TRACE:
===============================================================
BUG: unable to handle kernel NULL pointer dereference at 000000000000002c
IP: [<ffffffff8140f56f>] ipv4_dst_check+0xaf/0x190
PGD 0
Oops: 0002 [#1] SMP
CPU 8
Modules linked in: 8021q garp bridge stp llc iptable_filter ip_tables 
ixgbe ioatdma mdio dca hed

Pid: 0, comm: kworker/0:1 Not tainted 3.0.0-rc4-10g-lvs-pktgen #1 Intel 
Corporation S5520UR/S5520UR
RIP: 0010:[<ffffffff8140f56f>]  [<ffffffff8140f56f>] 
ipv4_dst_check+0xaf/0x190
RSP: 0018:ffff8801efc83a40  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88014d428900 RCX: ffff8801a44fa000
RDX: 0000000000000000 RSI: ffff8801a4335bc0 RDI: 0000000000000000
RBP: 00000000fea2476d R08: 000000000000fa4b R09: 0000000000007d25
R10: 00000000000000c0 R11: 0000000000000003 R12: ffff8801a4335bc0
R13: 0000000000006bc1 R14: 0000000000000000 R15: ffff88016291da20
FS:  0000000000000000(0000) GS:ffff8801efc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000000002c CR3: 0000000001697000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:1 (pid: 0, threadinfo ffff8801e90ee000, task 
ffff8801e90d9680)
Stack:
  ffff88014d428900 ffff88016291d780 0000000000000000 ffffffff813dccfa
  ffff88036fff9000 ffff8801b77bfc58 ffff88016291d780 ffffffff81417a82
  ffff8801a44fb0a0 ffff88016291d780 ffff8801b77bfc58 ffff8801b77bfc80
Call Trace:
<IRQ>
  [<ffffffff813dccfa>] ? __sk_dst_check+0x4a/0x70
  [<ffffffff81417a82>] ? ip_queue_xmit+0x2b2/0x3c0
  [<ffffffff8142c23b>] ? tcp_transmit_skb+0x3bb/0x850
  [<ffffffff8142e8cc>] ? tcp_write_xmit+0x1ec/0xa10
  [<ffffffff8142f239>] ? __tcp_push_pending_frames+0x19/0x80
  [<ffffffff81426076>] ? tcp_data_snd_check+0x36/0x120
  [<ffffffff8142a5d9>] ? tcp_rcv_established+0x349/0x7c0
  [<ffffffff8143204f>] ? tcp_v4_do_rcv+0x10f/0x2e0
  [<ffffffff81412300>] ? ip_rcv_finish+0x350/0x350
  [<ffffffff81433102>] ? tcp_v4_rcv+0x4e2/0x7a0
  [<ffffffff8141237d>] ? ip_local_deliver_finish+0x7d/0x130
  [<ffffffff813e802e>] ? __netif_receive_skb+0x1ae/0x350
  [<ffffffff813edc78>] ? netif_receive_skb+0x78/0x80
  [<ffffffff813ee21b>] ? napi_gro_receive+0xbb/0xd0
  [<ffffffff813edda8>] ? napi_skb_finish+0x38/0x50
  [<ffffffffa004c372>] ? ixgbe_clean_rx_irq+0x4f2/0x780 [ixgbe]
  [<ffffffffa004eddd>] ? ixgbe_clean_rxtx_many+0xed/0x1f0 [ixgbe]
  [<ffffffff8120b890>] ? timerqueue_add+0x60/0xb0
  [<ffffffff813ee366>] ? net_rx_action+0x86/0x170
  [<ffffffff8104aab1>] ? __do_softirq+0x91/0x140
  [<ffffffff8107ccfa>] ? handle_irq_event_percpu+0x7a/0x140
  [<ffffffff81474e4c>] ? call_softirq+0x1c/0x30
  [<ffffffff8100428d>] ? do_softirq+0x4d/0x80
  [<ffffffff8104a975>] ? irq_exit+0xb5/0xc0
  [<ffffffff81003aac>] ? do_IRQ+0x5c/0xd0
  [<ffffffff814737d3>] ? common_interrupt+0x13/0x13
<EOI>
  [<ffffffff81251c8c>] ? acpi_hw_read_multiple+0x28/0x60
  [<ffffffff81261afd>] ? acpi_idle_enter_bm+0x22c/0x260
  [<ffffffff81261af8>] ? acpi_idle_enter_bm+0x227/0x260
  [<ffffffff813b7281>] ? cpuidle_idle_call+0x81/0xf0
  [<ffffffff810017d8>] ? cpu_idle+0x58/0xb0
Code: 00 89 83 d4 00 00 00 eb 98 0f 1f 00 48 85 db 74 16 48 8b 43 40 31 
ff 48 85 c0 74 0f 48 8b 15 19 47 2f 00 48 89 50 18 48 8b 7b 40 <f0> ff 
4f 2c 0f 94 c0 84 c0 0f 85 ab 00 00 00 48 c7 43 40 00 00
RIP  [<ffffffff8140f56f>] ipv4_dst_check+0xaf/0x190
  RSP <ffff8801efc83a40>
CR2: 000000000000002c
---[ end trace 8a3fd44eb302579f ]---
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists