netdev - Re: e1000 watchdog timer fails to start after ethool settings change

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.WNT.2.00.0904160921570.732@jbrandeb-desk1.amr.corp.intel.com>
Date:	Thu, 16 Apr 2009 09:43:55 -0700 (Pacific Daylight Time)
From:	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>
To:	Ed Swierk <eswierk@...stanetworks.com>
cc:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	e1000-devel@...ts.sourceforge.net, jesse.brandeburg@...el.com,
	david.graham@...el.com
Subject: Re: e1000 watchdog timer fails to start after ethool settings
 change

added maintainer's list

On Wed, 15 Apr 2009, Ed Swierk wrote:
> A recent patch to e1000, intended to avoid a race in the interrupt code,
> seems to prevent the watchdog timer from resuming properly.  It neuters
> the effect of
> 
>   /* fire a link change interrupt to start the watchdog */ 
>   ew32(ICS, E1000_ICS_LSC); 
>  
> in e1000_up() when the __E1000_RESETTING flag is set, for example when
> invoked by e1000_set_settings().

Ugh, ethtool, my old nemesis.
 
> The result is that running
> 
>   ethtool -s eth0 autoneg on
> 
> in Qemu with an emulated e1000 NIC causes the kernel to stop noticing
> link status changes, leaving the link down until I prod the driver with
> ifconfig eth0 down; ifconfig eth0 up.

Wow, thanks very much for reporting this, it is a pretty subtle bug that 
I'm sure wasn't too easy to find.

I've got to think some on how to fix this and I hope to have a patch in 
the next few days.  Would you agree not very high priority?

My guess is this may be related to the people that reported vmware failing 
with this same change.

Why it doesn't occur on real physical hardware will be interesting to find 
out as well.

> Here is the patch causing the bug:
> 
> > From: Jesse Brandeburg <jesse.brandeburg@...xxxxxx>
> > 
> > A nasty bug was found where an MTU change (or anything else that caused a
> > reset) could race with the interrupt code.  The interrupt code was entered
> > by a shared interrupt during the MTU change.
> > 
> > This change prevents the interrupt code from running while the driver is in
> > the middle of its reset path.
> > 
> > Signed-off-by: Jesse Brandeburg <jesse.brandeburg@...xxxxxx>
> > ---
> > 
> >  drivers/net/e1000/e1000_main.c |    4 ++--
> >  1 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
> > index 26474c9..c986978 100644
> > --- a/drivers/net/e1000/e1000_main.c
> > +++ b/drivers/net/e1000/e1000_main.c
> > @@ -31,7 +31,7 @@
> >  
> >  char e1000_driver_name[] = "e1000";
> >  static char e1000_driver_string[] = "Intel(R) PRO/1000 Network Driver";
> > -#define DRV_VERSION "7.3.20-k3-NAPI"
> > +#define DRV_VERSION "7.3.21-k3-NAPI"
> >  const char e1000_driver_version[] = DRV_VERSION;
> >  static const char e1000_copyright[] = "Copyright (c) 1999-2006 Intel Corporation.";
> >  
> > @@ -3712,7 +3712,7 @@ static irqreturn_t e1000_intr(int irq, void *data)
> >  	struct e1000_hw *hw = &adapter->hw;
> >  	u32 rctl, icr = er32(ICR);
> >  
> > -	if (unlikely(!icr))
> > +	if (unlikely((!icr) || test_bit(__E1000_RESETTING, &adapter->flags)))
> >  		return IRQ_NONE;  /* Not our interrupt */
> >  
> >  	/* IMS will not auto-mask if INT_ASSERTED is not set, and if it is
> > 
> 
> Undoing the patch fixes the problem on a Qemu emulated e1000 NIC:
> 
> Index: linux-2.6.29.1/drivers/net/e1000/e1000_main.c
> ===================================================================
> --- linux-2.6.29.1.orig/drivers/net/e1000/e1000_main.c
> +++ linux-2.6.29.1/drivers/net/e1000/e1000_main.c
> @@ -3712,7 +3712,7 @@ static irqreturn_t e1000_intr(int irq, v
>  	struct e1000_hw *hw = &adapter->hw;
>  	u32 rctl, icr = er32(ICR);
>  
> -	if (unlikely((!icr) || test_bit(__E1000_RESETTING, &adapter->flags)))
> +	if (unlikely(!icr))
>  		return IRQ_NONE;  /* Not our interrupt */
>  
>  	/* IMS will not auto-mask if INT_ASSERTED is not set, and if it is
> 
> ---
> 
> Of course this leaves the original problem unfixed; hopefully someone
> can suggest a better solution.

Give us a few days, I think we can come up with something.

Jesse
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html