[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1396111046.2898.88.camel@deadeye.wl.decadent.org.uk>
Date: Sat, 29 Mar 2014 16:37:26 +0000
From: Ben Hutchings <ben@...adent.org.uk>
To: Russell King - ARM Linux <linux@....linux.org.uk>
Cc: netdev@...r.kernel.org
Subject: Re: Proper suspend/resume flow
On Wed, 2014-03-26 at 11:58 +0000, Russell King - ARM Linux wrote:
> I've been looking at the Freescale FEC driver recently, and I've noticed
> that it's suspend path looks like this:
>
> if (netif_running(ndev)) {
> netif_device_detach(ndev);
> fec_stop(ndev);
> }
>
> I believe this to be unsafe, because if the device is active, then we may
> have packets in the transmit ring. It may be possible for the ring to be
> cleaned between netif_device_detach() and fec_stop(), resulting in a call
> to netif_wake_queue(), which doesn't seem to take account of whether the
> device has been detached.
>
> I wondered if this was just limited to the freescale driver, and I found
> a similar pattern in e1000e - __e1000e_shutdown() calls netif_device_detach()
> as the very first thing, before doing anything else. This seems to be a
> common pattern. 8139cp.c does this, the second call seems to be rather
> redundant:
>
> netif_device_detach (dev);
> netif_stop_queue (dev);
>
> Tulip on the other hand shuts the hardware down and cleans the transmit
> ring first before calling netif_device_detach(), thus ensuring that there
> won't be any transmit completions before this call. It doesn't stop the
> queue before this, so presumably in a SMP environment, it is possible for
> packets to get queued after the transmit ring has been cleaned.
>
> So, what is the correct approach to suspending a net device? When should
> netif_device_detach() be called?
[...]
My advice here is based on Solarflare's extensive stress-testing of the
sfc driver control path and the race conditions it uncovered. It did
not cover suspend/resume, but it does cover MTU and ring size changes
which involve the same sort of stop/start while the device is still
marked as running.
- Given the way the watchdog works, I think netif_device_detach() must
be called first if you are going to stop TX queues in a control
operation other than ndo_stop (commits e4abce853849, 29c69a488264).
- The driver must check netif_device_present() before calling
netif_wake_queue(). Perhaps netif_wake_queue() itself should check
that, if many drivers need to do it.
- Calls to netif_device_detach() may need to be synchronised with the TX
scheduler (commits c2f3b8e3a44b, 35205b211c8d). To support this, the
function efx_device_detach_sync() maybe should be turned into a general
netif_device_detach_sync() (and then possibly needs to disable IRQs).
Ben.
--
Ben Hutchings
[W]e found...that it wasn't as easy to get programs right as we had thought.
... I realized that a large part of my life from then on was going to be spent
in finding mistakes in my own programs. - Maurice Wilkes, 1949
Download attachment "signature.asc" of type "application/pgp-signature" (812 bytes)
Powered by blists - more mailing lists