lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1659861.dHpGAGoXHj@cpaasch-mac>
Date:	Tue, 22 Jan 2013 11:15:17 +0100
From:	Christoph Paasch <christoph.paasch@...ouvain.be>
To:	Ian Campbell <Ian.Campbell@...rix.com>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Sony Chacko <sony.chacko@...gic.com>,
	Rajesh Borundia <rajesh.borundia@...gic.com>,
	David Miller <davem@...emloft.net>, netdev@...r.kernel.org
Subject: BUG in netxen_release_tx_buffers when TSO enabled on kernels >= 3.3 and <= 3.6

Hello,

I have a scenario where I can trigger a bug on kernels >= 3.3 and <= 3.6. 
Thus, I can produce it with the latest longterm-stable v3.4.26.

The crashdumps/warning can be seen below. Sometimes it is only the warning, 
sometimes it also produces the crash. But, it happens each time I try out my 
scenario.


How to reproduce the bug (I have HP Proliant DL165 machines with HP NC375T 1Gb 
interface):

 * Launch an iperf-session ( -t 10 ) to a server over a 1Gbps interface.

 * After 5 seconds on the client, remove the IP-address from the interface
with ip addr del dev [itf] [ip]

 * Wait 10 more seconds and kill the iperf on the client and the server.

 * Then do: ifconfig down [itf]

Now the crash happens.

What I observe in netxen_release_tx_buffers is that upon the 18th iteration (j 
== 17), buffrag->length == 0. buffrag->frag_count is 18.
Sometimes (much more rare), buffrag->length rather looks like garbage (e.g., > 
2^32)


I bisected this, and it was introduced by commit 9d4dde521577 (net: only use a 
single page of slop in MAX_SKB_FRAGS). 
It was fixed by Eric in commit 5640f7685831 (net: use a per task frag 
allocator) since kernel > 3.6.

As this bug is present in the longterm-stable 3.4, should Eric's patch be 
backported?
If not, does somebody (with more knowledge than I have of this part of the 
code) can have a look at it, or maybe give me a pointer on how I could solve 
this properly?

Reverting commit 9d4dde521577 (net: only use a single page of slop in 
MAX_SKB_FRAGS) fixes it for me on 3.4.26.



Thanks,
Christoph



[  610.315966] ------------[ cut here ]------------
[  610.371099] WARNING: at /home/cpaasch/builder/net-next/lib/dma-debug.c:865 
check_unmap+0x18e/0x61e()
[  610.480197] Hardware name: ProLiant DL165 G7
[  610.531168] netxen_nic 0000:05:00.2: DMA-API: device driver tries to free 
DMA memory it has not allocated [device address=0x0000000000000012] [size=0 
bytes]
[  610.698391] Modules linked in:
[  610.734935] Pid: 3728, comm: ip Not tainted 3.4.26-mptcp #30
[  610.802511] Call Trace:
[  610.831692]  [<ffffffff81025af7>] warn_slowpath_common+0x80/0x98
[  610.903424]  [<ffffffff81025ba3>] warn_slowpath_fmt+0x41/0x43
[  610.972041]  [<ffffffff811c93d1>] check_unmap+0x18e/0x61e
[  611.036505]  [<ffffffff811c99aa>] debug_dma_unmap_page+0x50/0x52
[  611.108236]  [<ffffffff813642f3>] netxen_release_tx_buffers+0x11e/0x175
[  611.187232]  [<ffffffff813621a6>] __netxen_nic_down+0x12c/0x13f
[  611.257922]  [<ffffffff81362290>] netxen_nic_close+0x13/0x17
[  611.325502]  [<ffffffff813fc7dc>] __dev_close_many+0x90/0xbc
[  611.393079]  [<ffffffff813fc839>] __dev_close+0x31/0x42
[  611.455469]  [<ffffffff813fa38d>] __dev_change_flags+0xb9/0x13d
[  611.526160]  [<ffffffff813fd16e>] dev_change_flags+0x1c/0x52
[  611.593740]  [<ffffffff814075ad>] do_setlink+0x2c0/0x7d2
[  611.657167]  [<ffffffff8146fcb5>] ? inet6_fill_ifla6_attrs+0x205/0x219
[  611.735124]  [<ffffffff814085a3>] rtnl_newlink+0x26b/0x4a1
[  611.800626]  [<ffffffff81408400>] ? rtnl_newlink+0xc8/0x4a1
[  611.867166]  [<ffffffff81419ebb>] ? netlink_sendmsg+0x22b/0x2b2
[  611.937859]  [<ffffffff8109fcf3>] ? check_object+0x13b/0x1df
[  612.005437]  [<ffffffff8140831d>] rtnetlink_rcv_msg+0x22c/0x247
[  612.076128]  [<ffffffff814080f1>] ? rtnetlink_rcv+0x28/0x28
[  612.142668]  [<ffffffff81419c3f>] netlink_rcv_skb+0x3e/0x8f
[  612.209209]  [<ffffffff814080ea>] rtnetlink_rcv+0x21/0x28
[  612.273673]  [<ffffffff81419a01>] netlink_unicast+0x134/0x1ab
[  612.342289]  [<ffffffff81419eda>] netlink_sendmsg+0x24a/0x2b2
[  612.410908]  [<ffffffff813ebb1a>] sock_sendmsg+0xb8/0xd1
[  612.474332]  [<ffffffff81073a29>] ? filemap_fault+0x199/0x35d
[  612.542948]  [<ffffffff810732ab>] ? unlock_page+0x2d/0x32
[  612.607412]  [<ffffffff810894c2>] ? __do_fault+0x3ce/0x409
[  612.672916]  [<ffffffff813eb239>] ? move_addr_to_kernel+0x3a/0x51
[  612.745685]  [<ffffffff813f4f76>] ? verify_iovec+0x59/0xaf
[  612.811187]  [<ffffffff813ec675>] __sys_sendmsg+0x1b9/0x23e
[  612.877725]  [<ffffffff8108c2ab>] ? handle_mm_fault+0x1b7/0x1cd
[  612.948420]  [<ffffffff8101f08e>] ? do_page_fault+0x336/0x375
[  613.017035]  [<ffffffff81090681>] ? do_brk+0x2e4/0x346
[  613.078385]  [<ffffffff813ec80d>] sys_sendmsg+0x3d/0x5e
[  613.140775]  [<ffffffff814e62a2>] system_call_fastpath+0x16/0x1b
[  613.212502] ---[ end trace d4f0eb8a4ca35e8a ]---



[  719.276359] BUG: unable to handle kernel paging request at ffffc900103cd000
[  719.359733] IP: [<ffffffff81364295>] netxen_release_tx_buffers+0xc0/0x175
[  719.440926] PGD 42d851067 PUD 42d852067 PMD 42c11b067 PTE 0
[  719.507897] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[  719.562307] CPU 7
[  719.584214] Modules linked in:
[  719.622944]
[  719.640700] Pid: 4563, comm: ip Tainted: G        W    3.4.26-mptcp #30 HP 
ProLiant DL165 G7
[  719.741707] RIP: 0010:[<ffffffff81364295>]  [<ffffffff81364295>] 
netxen_release_tx_buffers+0xc0/0x175
[  719.851952] RSP: 0018:ffff880422c79668  EFLAGS: 00010246
[  719.915380] RAX: ffff88042dbce360 RBX: ffffc900103cced0 RCX: 
0000000000000000
[  720.000603] RDX: 0000000000000011 RSI: 0000000000000282 RDI: 
0000000000000000
[  720.085826] RBP: ffff880422c796b8 R08: 0000000000000000 R09: 
ffff88042dbce3e8
[  720.171049] R10: 0000000000000001 R11: 0000000000000206 R12: 
ffffc900103ccff8
[  720.256273] R13: ffff88042d9aa720 R14: 0000000000000012 R15: 
0000000000000000
[  720.341499] FS:  00007f191d157700(0000) GS:ffff88043fdc0000(0000) 
knlGS:0000000000000000
[  720.438138] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  720.506755] CR2: ffffc900103cd000 CR3: 0000000422cbd000 CR4: 
00000000000007e0
[  720.591979] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[  720.677203] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[  720.762427] Process ip (pid: 4563, threadinfo ffff880422c78000, task 
ffff88042d9b4b90)
[  720.856990] Stack:
[  720.880968]  ffff880422c796b8 0000000000000000 ffff88042bc53b28 
00000011000003ff
[  720.969736]  ffff880422c796b8 ffff88042d9aa720 ffff88042bc6c920 
0000000000000001
[  721.058506]  ffff88042d9aa0a0 0000000000000007 ffff880422c796f8 
ffffffff813621a6
[  721.147273] Call Trace:
[  721.176450]  [<ffffffff813621a6>] __netxen_nic_down+0x12c/0x13f
[  721.247140]  [<ffffffff81362290>] netxen_nic_close+0x13/0x17
[  721.314721]  [<ffffffff813fc7dc>] __dev_close_many+0x90/0xbc
[  721.382295]  [<ffffffff813fc839>] __dev_close+0x31/0x42
[  721.444687]  [<ffffffff813fa38d>] __dev_change_flags+0xb9/0x13d
[  721.515378]  [<ffffffff813fd16e>] dev_change_flags+0x1c/0x52
[  721.582958]  [<ffffffff814075ad>] do_setlink+0x2c0/0x7d2
[  721.646384]  [<ffffffff8146fcb5>] ? inet6_fill_ifla6_attrs+0x205/0x219
[  721.724342]  [<ffffffff814085a3>] rtnl_newlink+0x26b/0x4a1
[  721.789842]  [<ffffffff81408400>] ? rtnl_newlink+0xc8/0x4a1
[  721.856384]  [<ffffffff81419ebb>] ? netlink_sendmsg+0x22b/0x2b2
[  721.927079]  [<ffffffff8109fcf3>] ? check_object+0x13b/0x1df
[  721.994659]  [<ffffffff8140831d>] rtnetlink_rcv_msg+0x22c/0x247
[  722.065351]  [<ffffffff814080f1>] ? rtnetlink_rcv+0x28/0x28
[  722.131893]  [<ffffffff81419c3f>] netlink_rcv_skb+0x3e/0x8f
[  722.198433]  [<ffffffff814080ea>] rtnetlink_rcv+0x21/0x28
[  722.262897]  [<ffffffff81419a01>] netlink_unicast+0x134/0x1ab
[  722.331515]  [<ffffffff81419eda>] netlink_sendmsg+0x24a/0x2b2
[  722.400131]  [<ffffffff813ebb1a>] sock_sendmsg+0xb8/0xd1
[  722.463559]  [<ffffffff81073a29>] ? filemap_fault+0x199/0x35d
[  722.532171]  [<ffffffff810732ab>] ? unlock_page+0x2d/0x32
[  722.596636]  [<ffffffff810894c2>] ? __do_fault+0x3ce/0x409
[  722.662140]  [<ffffffff813eb239>] ? move_addr_to_kernel+0x3a/0x51
[  722.734909]  [<ffffffff813f4f76>] ? verify_iovec+0x59/0xaf
[  722.800410]  [<ffffffff813ec675>] __sys_sendmsg+0x1b9/0x23e
[  722.866951]  [<ffffffff8108c2ab>] ? handle_mm_fault+0x1b7/0x1cd
[  722.937643]  [<ffffffff8101f08e>] ? do_page_fault+0x336/0x375
[  723.006260]  [<ffffffff81090681>] ? do_brk+0x2e4/0x346
[  723.067610]  [<ffffffff813ec80d>] sys_sendmsg+0x3d/0x5e
[  723.129998]  [<ffffffff814e62a2>] system_call_fastpath+0x16/0x1b
[  723.201724] Code: e6 ff 48 c7 43 08 00 00 00 00 4c 8d 63 08 c7 45 cc 00 00 
00 00 eb 7d 49 83 c4 10 4d 8b 34 24 4d 85 f6 74 6d 49 8b 45 58 45 31 ff <4d> 
8b 4c 24 08 48 85 c0 74 13 4c 8d b8 88 00 00 00 48 8b 80 b8
[  723.434030] RIP  [<ffffffff81364295>] netxen_release_tx_buffers+0xc0/0x175
[  723.516249]  RSP <ffff880422c79668>
[  723.557878] CR2: ffffc900103cd000
[  723.597832] ---[ end trace d4f0eb8a4ca35e8b ]---


-- 
IP Networking Lab --- http://inl.info.ucl.ac.be
MultiPath TCP in the Linux Kernel --- http://mptcp.info.ucl.ac.be
UCLouvain
--
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ