lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 26 Nov 2008 22:12:20 +0100
From:	Willy Tarreau <w@....eu>
To:	Matt Carlson <mcarlson@...adcom.com>
Cc:	Roger Heflin <rogerheflin@...il.com>,
	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	netdev <netdev@...r.kernel.org>
Subject: Re: WARNING: at net/sched/sch_generic.c:219 dev_watchdog+0xfe/0x17e() with tg3 network

Hi Matt,

On Tue, Nov 25, 2008 at 09:54:13AM -0800, Matt Carlson wrote:
> On Mon, Nov 24, 2008 at 09:31:28PM -0800, Willy Tarreau wrote:
> > On Mon, Nov 24, 2008 at 05:52:23PM -0800, Matt Carlson wrote:
> > (...)
> > > > tg3: eth0: transmit timed out, resetting
> > > > tg3: DEBUG: MAC_TX_STATUS[0000000b] MAC_RX_STATUS[00000006]
> > > > tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000008]
> > > > tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
> > > > tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
> > > > tg3: tg3_stop_block timed out, ofs=4c00 enable_bit=2
> > > > tg3: eth0: Link is down.
> > > > tg3: eth0: Link is up at 100 Mbps, full duplex.
> > > > tg3: eth0: Flow control is on for TX and on for RX.
> > > > 
> > > > The ease with which I reproduce it here clearly indicates that this is
> > > > related to the switch, probably just the fact that it is at 100 Mbps.
> > > > Unfortunately this evening I must go, but I still have one 100 Mbps
> > > > switch somewhere at home, I'll reproduce the same test ASAP in order
> > > > to bisect the issue.
> > > > 
> > > > Regards,
> > > > Willy
> > > 
> > > Does turning off flow control help at all?
> > 
> > I have not tested but I will. I hope to be able to trigger the problem
> > on other similar switches, because I'm only once a week connected to
> > the culprit...
> 
> I can't say for certain, but I suspect the problem might be more
> associated with the link speed than the particular switch you are using.
> Can you try autoneg'ing down to a slower speed and see if that helps
> make the problem more reproducable?

I've run a new test on a switch I have here at home (another el-cheapo,
non-manageable 100 Mbps, netgear this time). Unfortunately I cannot
reproduce the problem at all. I have disabled FC on my laptop, it did
not have any effect. I have disabled auto-neg and manually forced the
speed to 100/Full on my laptop, and could not reproduce the problem
either (though the speed was much lower due to the switch obviously
negociating 100/Half when not seeing my NWay frames).

I have tried unplugging the cable during transfers and changing negociation
during transfers, trying to trigger artifacts, but with no result. So I
think that I will really need to debug this on the "faulty" switch on
next monday. It does not surprize me much, because we don't see that
many reports for a similar problem, eventhough the tg3 is very common
in laptops. I just hope it's a recent regression, as I'd prefer avoid
having to bisect from a very old kernel.

I'll keep you informed,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ