lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 29 Jul 2013 02:10:31 +0000
From:	Duan Fugang-B38611 <B38611@...escale.com>
To:	Ben Hutchings <bhutchings@...arflare.com>
CC:	Stephen Hemminger <stephen@...workplumber.org>,
	Uwe Kleine-König 
	<u.kleine-koenig@...gutronix.de>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Estevam Fabio-R49496 <r49496@...escale.com>,
	Li Frank-B20596 <B20596@...escale.com>,
	Shawn Guo <shawn.guo@...aro.org>,
	"kernel@...gutronix.de" <kernel@...gutronix.de>,
	Hector Palacios <hector.palacios@...i.com>,
	Tim Sander <tim.sander@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: RE: [PATCH] net/fec: call netif_carrier_off when not having link

>netif_stop_queue() *must not* be called before netif_carrier_off(), otherwise the TX watchdog can fire immediately.
>The TX watchdog only knows when the last packet was passed to the driver, not when the queue was stopped.
>The last packet could have been added an arbitrarily long time before the link went down, therefore it may appear that the timeout has already expired..
>
>Although it is safe to call netif_stop_queue() after netif_carrier_off(), it is not useful.
>netif_stop_queue() should only be called from your ndo_start_xmit operation and only because the queue is full. 
>Any other reason to stop should be communicated to the kernel using netif_carrier_off() or netif_device_detach().
>
>Ben.

Agree.
I remember you said:
The watchdog fires when the software queue has been stopped *and* the link has been reported as up for over dev->watchdog_timeo ticks.
The software queue should be stopped if the hardware queue is full or nearly full.  If the software queue remains stopped and the link is
still reported up, then one of these things is happening:

1. The link went down but the driver didn't notice, or sent a transmit packet which never completes
2. TX completions are not being indicated or handled correctly
3. The hardware TX path has locked up
4. The link is stalled by excessive pause frames or collisions
5. Timeout is too low and/or low watermark is too high
(there may be other explanations)

The watchdog is primarily meant to deal with case 3, though all of cases 1-3 may be worked around by resetting the hardware.


Thanks,
Andy

-----Original Message-----
From: Ben Hutchings [mailto:bhutchings@...arflare.com] 
Sent: Friday, July 26, 2013 11:32 PM
To: Duan Fugang-B38611
Cc: Stephen Hemminger; Uwe Kleine-König; netdev@...r.kernel.org; David S. Miller; Estevam Fabio-R49496; Li Frank-B20596; Shawn Guo; kernel@...gutronix.de; Hector Palacios; Tim Sander; Steven Rostedt; Thomas Gleixner
Subject: Re: [PATCH] net/fec: call netif_carrier_off when not having link

On Fri, 2013-07-26 at 09:35 +0000, Duan Fugang-B38611 wrote:
> On Fir, 26 Jul 2013 12:04, Stephen Hemminger wrote:
> >> diff --git a/drivers/net/ethernet/freescale/fec_main.c
> >> b/drivers/net/ethernet/freescale/fec_main.c
> >> index 0642006..631bd5a 100644
> >> --- a/drivers/net/ethernet/freescale/fec_main.c
> >> +++ b/drivers/net/ethernet/freescale/fec_main.c
> >> @@ -280,11 +280,6 @@ fec_enet_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> >>  	unsigned short	status;
> >>  	unsigned int index;
> >>  
> >> -	if (!fep->link) {
> >> -		/* Link is down or auto-negotiation is in progress. */
> >> -		return NETDEV_TX_BUSY;
> >> -	}
> >> -
> >
> >That is a bug anyway. Since it would cause spin loop in transmit code (even without -rt).
> >If the driver cared to test it (most drivers just let hardware deal 
> >with this situation), then it should free packet and return TX_OK.
> 
> When link is down, the logic is
> 	- call netif_stop_queue() to stop queue
> 	- and then notify there have no link using netif_carrier_off().

netif_stop_queue() *must not* be called before netif_carrier_off(), otherwise the TX watchdog can fire immediately.  The TX watchdog only knows when the last packet was passed to the driver, not when the queue was stopped.  The last packet could have been added an arbitrarily long time before the link went down, therefore it may appear that the timeout has already expired..

Although it is safe to call netif_stop_queue() after netif_carrier_off(), it is not useful.  netif_stop_queue() should only be called from your ndo_start_xmit operation and only because the queue is full.  Any other reason to stop should be communicated to the kernel using netif_carrier_off() or netif_device_detach().

Ben.

> But the flow must be handled in fec_enet_adjust_link() function, not in xmit().

--
Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ