[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <309B89C4C689E141A5FF6A0C5FB2118B78D72A14@ORSMSX101.amr.corp.intel.com>
Date: Sat, 21 Feb 2015 01:59:35 +0000
From: "Brown, Aaron F" <aaron.f.brown@...el.com>
To: Thomas Jarosch <thomas.jarosch@...ra2net.com>
CC: "Kirsher, Jeffrey T" <jeffrey.t.kirsher@...el.com>,
'Linux Netdev List' <netdev@...r.kernel.org>,
Eric Dumazet <edumazet@...gle.com>,
e1000-devel <e1000-devel@...ts.sourceforge.net>
Subject: RE: [bisected regression] e1000e: "Detected Hardware Unit Hang"
> -----Original Message-----
> From: Thomas Jarosch [mailto:thomas.jarosch@...ra2net.com]
> Sent: Friday, February 13, 2015 8:15 AM
> To: Brown, Aaron F
> Cc: Kirsher, Jeffrey T; 'Linux Netdev List'; Eric Dumazet; e1000-devel
> Subject: Re: [bisected regression] e1000e: "Detected Hardware Unit Hang"
>
> Hi Aaron,
>
> On Thursday, 12. February 2015 23:28:27 Brown, Aaron F wrote:
> > I do not have any real info. I had been asked to try and reproduce some
> > unit hangs (maybe for this) recently and did not succeed in producing
> > them on the parts I have. Reading through the thread I see this is
> > showing up in a NAT environment. The port that is getting the unit hang
> > in the NAT system?
>
> yes, the e1000e NIC is serving the NATed Windows client.
>
> The setup was outlined here:
>
> http://marc.info/?l=linux-netdev&m=142133691713824&w=2
>
> > I will make some attempts at replicating this with the port in a NAT and
> > or forwarding role. Has a bug been opened for this? Or has information
> > for this specific unit hang been entered into one of the other unit hang
> > bugs opened against e1000e?
>
> I didn't do anything(tm). This report sounds like the same issue:
>
> http://ehc.ac/p/e1000/bugs/378/
>
> Oliver Wagner wrote the problem started to appear
> after updating from kernel 3.5 to 3.8.0.35 (new frag size code).
>
> I just noticed now he wrote he has two identical boxes:
>
> ---------------------------------------------------
> - Box with symptoms: Router/Firewall, packet forwarding
> between different VLANs on eth0 and eth1
> - Box without symptoms: Fileserver, eth0/eth1 bonded
> (VLANs used, but no forwarding)
> ---------------------------------------------------
>
> So it looks like it's related to forwarding somehow,
> I've made the same experience IIRC.
Thanks, that (and the multiple bug write-ups on sourceforge) gave me more than enough to go on. I was able to replicate it on a handful of systems in my lab. On effected systems setting up a NAT and stressing the interfaces with even moderate traffic levels triggers it pretty quickly. It appears that the NAT part is unnecessary, just setting the systems up as a software router and running some traffic across it also triggers it giving the same apparent behavior (tx hang, watchdog timeout trace, port reset.)
And with an internal reproduction of the issue I have created an internal bug report, described my set of reproductions, referenced the similar external ones and assigned it to our current e1000e developer.
Thanks again,
Aaron
>
> Cheers,
> Thomas
Powered by blists - more mailing lists