linux-kernel - Re: TG3 network data corruption regression 2.6.24/2.6.23.4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080415001207.GA11852@localdomain>
Date:	Mon, 14 Apr 2008 17:12:07 -0700
From:	"Matt Carlson" <mcarlson@...adcom.com>
To:	"Tony Battersby" <tonyb@...ernetics.com>
cc:	"Herbert Xu" <herbert@...dor.apana.org.au>,
	"Michael Chan" <mchan@...adcom.com>,
	"David Miller" <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>, gregkh@...e.de,
	linux-kernel@...r.kernel.org
Subject: Re: TG3 network data corruption regression 2.6.24/2.6.23.4

Hi Tony.  Sorry for the radio silence.

Michael and I have discussed this problem a bit.  Another possibility is
that the chip may be having difficulty with non-dword aligned TX buffers.
Since we already know the RX side has the same problem, it isn't so
far-fetched to think that perhaps it can affect the TX side too.  Can
you give the following patch a try and see if the corruption still
happens?


diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 96043c5..810c711 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -4135,11 +4135,20 @@ static int tigon3_dma_hwbug_workaround(struct tg3 *tp, struct sk_buff *skb,
 				       u32 last_plus_one, u32 *start,
 				       u32 base_flags, u32 mss)
 {
-	struct sk_buff *new_skb = skb_copy(skb, GFP_ATOMIC);
+	struct sk_buff *new_skb;
 	dma_addr_t new_addr = 0;
 	u32 entry = *start;
 	int i, ret = 0;
 
+	if (GET_ASIC_REV(tp->pci_chip_rev_id) != ASIC_REV_5701)
+		new_skb = skb_copy(skb, GFP_ATOMIC);
+	else {
+		int more_headroom = 4 - (skb->mac_header & 3);
+
+		new_skb = skb_copy_expand(skb, skb_headroom(skb) + more_headroom,
+					  skb_tailroom(skb), GFP_ATOMIC);
+	}
+
 	if (!new_skb) {
 		ret = -1;
 	} else {
@@ -4465,6 +4474,10 @@ static int tg3_start_xmit_dma_bug(struct sk_buff *skb, struct net_device *dev)
 	if (tg3_4g_overflow_test(mapping, len))
 		would_hit_hwbug = 1;
 
+	/* Force the 5701 into the double copy path. */
+	if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701)
+		would_hit_hwbug = 1;
+
 	tg3_set_txd(tp, entry, mapping, len, base_flags,
 		    (skb_shinfo(skb)->nr_frags == 0) | (mss << 1));
 


On Wed, Feb 20, 2008 at 10:18:58AM -0500, Tony Battersby wrote:
> Herbert Xu wrote:
> > On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote:
> >   
> >> Update: when I revert Herbert's patch in addition to applying your
> >> patch, the iSCSI performance goes back up to 115 MB/s again in both
> >> directions.  So it looks like turning off SG for TX didn't itself cause
> >> the performance drop, but rather that the performance drop is just
> >> another manifestation of whatever bug is causing the data corruption.
> >>     
> >
> > Interesting.  So the workload that regressed is mostly RX with a
> > little TX traffic? Can you try to reproduce this with something
> > like netperf to eliminate other variables?
> >
> > This is all very puzzling since the patch in question shouldn't
> > change an RX load at all.
> >
> > Thanks,
> >   
> We have established that the slowdown was caused by TCP checksum errors
> and retransmits.  I assume that the slowdown in my test was due to the
> light TX rather than the heavy RX.  I am no TCP protocol expert, but
> perhaps heavy TX (such as iperf) might not be affected as much because
> the wire stays busy while waiting for the retransmit, whereas with my
> light TX iSCSI load, the wire goes idle while waiting for the retransmit
> because the iSCSI state machine is stalled.
> 
> Tony
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/