[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1374438274.2804.80.camel@deadeye.wl.decadent.org.uk>
Date:	Sun, 21 Jul 2013 21:24:34 +0100
From:	Ben Hutchings <bhutchings@...arflare.com>
To:	Richard Weinberger <richard@....at>
CC:	<rl@...lgate.ch>, <netdev@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] via-rhine: Fix tx_timeout handling
On Sun, 2013-07-21 at 18:32 +0200, Richard Weinberger wrote:
> Am 21.07.2013 18:18, schrieb Ben Hutchings:
> > On Fri, 2013-07-19 at 23:30 +0200, Richard Weinberger wrote:
> >> rhine_reset_task() misses to call netif_stop_queue(),
> >> this can lead to a crash if work is still scheduled while
> >> we're resetting the tx queue.
> >>
> >> Fixes:
> >> [   93.591707] BUG: unable to handle kernel NULL pointer dereference at 0000004c
> >> [   93.595514] IP: [<c119d10d>] rhine_napipoll+0x491/0x6e
> >>
> >> Signed-off-by: Richard Weinberger <richard@....at>
> >> ---
> >>  drivers/net/ethernet/via/via-rhine.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c
> >> index b75eb9e..57e1b40 100644
> >> --- a/drivers/net/ethernet/via/via-rhine.c
> >> +++ b/drivers/net/ethernet/via/via-rhine.c
> >> @@ -1615,6 +1615,7 @@ static void rhine_reset_task(struct work_struct *work)
> >>  		goto out_unlock;
> >>  
> >>  	napi_disable(&rp->napi);
> >> +	netif_stop_queue(dev);
> > 
> > This is not really fixing the bug because there is no synchronisation
> > with the TX scheduler.  You can call netif_tx_disable() instead to do
> > that.
> 
> I guess other drivers suffer from that too.
> A quick grep showed that not many drivers are using netif_tx_disable().
> 
> > (I also think that it is preferable to use
> > netif_device_{detach,attach}() to stop the queue during reconfiguration,
> > as this is independent of TX completions and the watchdog.)
Actually, this is not independent of TX completions - netif_wake_queue()
will still start the TX scheduler while the device is not present, so
you have to avoid that.
> So the correct down sequence is napi_disable() -> netif_tx_disable() -> netif_device_detach()?
No, that's redundant.  You can do:
	napi_disable();
	netif_tx_lock_bh(); /* sync with TX scheduler */
	netif_device_detach();
	netif_tx_unlock_bh();
and then when the queue is ready to use again:
	netif_device_attach();
	napi_enable();
Ben.
-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists
 
