lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <46F0C51B.2070205@netxen.com>
Date:	Wed, 19 Sep 2007 12:13:39 +0530
From:	Dhananjay Phadke <dhananjay@...xen.com>
To:	Vernon Mauery <vernux@...ibm.com>
CC:	netdev <netdev@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: NetXen driver causing slab corruption in -RT kernels

That "00 0e 1e ..." is netxen's mac address, so sounds like the NIC is
dumping a frame in the skb already freed (and poisoned) by the stack.
I suppose -RT kernels preempt the softirq, giving a chance for this race.

The netxen driver doesn't seem to clear the mapped address of the skb in
rx descriptor, instead it replaces it with mapped address of newly
allocated skb. Also, the rx ring is replenished in bursts (at the end of
poll routine), this can help the race if preempted in between.

I am currently reworking the rx handling, hopefully it will fix this.

Thanks for reporting in detail.

-Dhananjay


Vernon Mauery wrote:
> In doing some stress testing of the NetXen driver, I found that my machine was 
> dying in all sorts of weird ways.  I saw several different crashes, BUG 
> messages in the TCP stack and some assert messages in the TCP stack as well.  
> I really didn't think that there could be six different bugs all at once in 
> the TCP/IP stack, so I started looking at possible memory corruption.
> 
> I first saw this on 2.6.16-rt22 with a backported netxen driver from 2.6.22.  
> I figured I should try the latest kernel, so I tried it on 2.6.23-rc6-git7 
> but could not trigger the slab corruption messages with CONFIG_DEBUG_SLAB, so 
> I figured the race must only exist in the -RT kernels.  Next I tried 
> 2.6.23-rc6-git7-rt1 (I applied patch-2.6.23-rc4-rt1 patch to 2.6.23-rc6-git7 
> and fixed the 5 failing hunks).  After an hour or so, lots of slab corruption 
> messages showed up:
> 
> Slab corruption: size-2048 start=f40c4670, len=2048
> Slab corruption: size-2048 start=f313cf48, len=2048
> Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
> Last user: [<c0166be4>](kfree+0x80/0x95)
> 010: 6b 6b 00 0e 1e 00 16 13 00 0e 1e 00 19 3d 08 00
> 020: 45 00 05 dc 92 ab 40 00 40 11 8a 5b 0a 02 02 03
> 030: 0a 02 02 04 80 0c 80 0d 05 c8 dc 39 00 00 00 00
> 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> Prev obj: start=f313c730, len=2048
> Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
> Last user: [<f8f06186>](netxen_post_rx_buffers_nodb+0x62/0x1f0 [netxen_nic])
> 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 010: 5a 5a 00 0e 1e 00 16 13 00 0e 1e 00 19 3d 08 00
> Next obj: start=f313d760, len=2048
> Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
> Last user: [<f8f06186>](netxen_post_rx_buffers_nodb+0x62/0x1f0 [netxen_nic])
> 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> Slab corruption: size-2048 start=f395a6f0, len=2048
> Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
> Last user: [<c0166be4>](kfree+0x80/0x95)
> 010: 6b 6b 00 0e 1e 00 16 13 00 0e 1e 00 19 3d 08 00
> 020: 45 00 05 dc 92 ac 40 00 40 11 8a 5a 0a 02 02 03
> 030: 0a 02 02 04 80 0c 80 0d 05 c8 dc 39 00 00 00 00
> 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> Next obj: start=f395af08, len=2048
> Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
> Last user: [<f8f06186>](netxen_post_rx_buffers_nodb+0x62/0x1f0 [netxen_nic])
> 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
> Last user: [<c0166be4>](kfree+0x80/0x95)
> 010: 6b 6b 00 0e 1e 00 16 13 00 0e 1e 00 19 3d 08 00
> 020: 45 00 05 dc 92 aa 40 00 40 11 8a 5c 0a 02 02 03
> 030: 0a 02 02 04 80 0c 80 0d 05 c8 dc 39 00 00 00 00
> 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> Next obj: start=f40c4e88, len=2048
> Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
> Last user: [<f8f06186>](netxen_post_rx_buffers_nodb+0x62/0x1f0 [netxen_nic])
> 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 010: 5a 5a 00 0e 1e 00 16 13 00 0e 1e 00 19 3d 08 00
> 
> The stress test that I am running is basically a mixed bag of stuff I threw 
> together.  It runs eight concurrent netperf TCP streams and two concurrent 
> UDP streams in both directions, (and on both 10GbE interfaces), ping -f in 
> both directions, some more disk/cpu loads in the background and a little bit 
> of NFS traffic thrown in for good measure.
> 
> --Vernon
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ