lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5333F919.7080405@freescale.com>
Date:	Thu, 27 Mar 2014 12:10:33 +0200
From:	Claudiu Manoil <claudiu.manoil@...escale.com>
To:	Russell King - ARM Linux <linux@....linux.org.uk>,
	<netdev@...r.kernel.org>
Subject: Re: Proper suspend/resume flow

On 3/26/2014 1:58 PM, Russell King - ARM Linux wrote:
> I think the right solution for a driver which does all its processing in
> the NAPI poll function would be:
>
> 	if (netif_running(ndev)) {
> 		napi_disable(napi);
> 		netif_device_detach(ndev);
> 		disable_hardware();
> 	}
>
> and upon resume, the reverse order.  The theory being:
>
> - napi_disable() is to stop the driver processing its rings and possibly
>    waking up the transmit queue(s) after the following netif_device_detach().
> - netif_device_detach() to stop new packets being queued to the device.


There's a risk to disabling the NAPI before stopping the transmission
(i.e. stopping the Tx queues).  The net stack might continue to enqueue
Tx packets and, if the Tx BD rings get full, the driver's start_xmit
will return TX_BUSY which leads to netdev watchdog triggering Tx
timeout (and the ndo_tx_timeout hook from the driver will re-enable the
Tx queues).
Maybe it's unlikely to get the BD rings full during suspend, but there's
another case too - BQL (if the driver supports BQL).  BQL needs a
confirmation for each xmitted byte, and the confirmation comes from
Tx completion processing from NAPI context (see
netdev_tx_completed_queue()).  If the confirmation is delayed too much,
BQL blocks the Tx queues to trigger the same netdev Tx timeout watchdog
mechanism.

I think there's a more general problem to this: how to properly stop
the Tx traffic from the driver.  And I think an approach to solve it
would be to stop the Tx queues (i.e. netif_tx_stop_all_queues()) before
disabling the NAPI processing, however the driver will need to use a
special state flag (like "DOWN" or "RESETTING") to prevent waking up
the Tx queues from NAPI while the Tx traffic is being stopped.
There are some drivers implementing such "synchronization" flags to
prevent the Tx congestion mechanism to wake up the Tx queues while the
device is resetting or brought down.  But there's no generic
implementation for this (there are differences in implementation from
driver to driver).

Claudiu


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ