netdev - Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5060c11df10c66f56b5ca7ec2ec92333252b084b.camel@gmail.com>
Date:   Thu, 23 Mar 2023 09:05:35 -0700
From:   Alexander H Duyck <alexander.duyck@...il.com>
To:     Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net
Cc:     netdev@...r.kernel.org, edumazet@...gle.com, pabeni@...hat.com,
        willemb@...gle.com
Subject: Re: [PATCH net-next 1/3] net: provide macros for commonly copied
 lockless queue stop/wake code

On Wed, 2023-03-22 at 16:30 -0700, Jakub Kicinski wrote:
> A lot of drivers follow the same scheme to stop / start queues
> without introducing locks between xmit and NAPI tx completions.
> I'm guessing they all copy'n'paste each other's code.
> 
> Smaller drivers shy away from the scheme and introduce a lock
> which may cause deadlocks in netpoll.
> 
> Provide macros which encapsulate the necessary logic.
> 
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
> ---
> rfc: https://lore.kernel.org/all/20230311050130.115138-1-kuba@kernel.org/
>  - perdicate -> predicate
>  - on race use start instead of wake and make a note of that
>    in the doc / comment at the start
> ---
>  include/net/netdev_queues.h | 171 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 171 insertions(+)
>  create mode 100644 include/net/netdev_queues.h
> 
> diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
> new file mode 100644
> index 000000000000..64e059647274
> --- /dev/null
> +++ b/include/net/netdev_queues.h
> @@ -0,0 +1,171 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_NET_QUEUES_H
> +#define _LINUX_NET_QUEUES_H
> +
> +#include <linux/netdevice.h>
> +
> +/* Lockless queue stopping / waking helpers.
> + *
> + * These macros are designed to safely implement stopping and waking
> + * netdev queues without full lock protection. We assume that there can
> + * be no concurrent stop attempts and no concurrent wake attempts.
> + * The try-stop should happen from the xmit handler*, while wake up
> + * should be triggered from NAPI poll context. The two may run
> + * concurrently (single producer, single consumer).
> + *
> + * All descriptor ring indexes (and other relevant shared state) must
> + * be updated before invoking the macros.
> + *
> + * * the try-stop side does not reschedule Tx (netif_tx_start_queue()
> + *   instead of netif_tx_wake_queue()) so uses outside of the xmit
> + *   handler may lead to bugs
> + */
> +
> +#define netif_tx_queue_try_stop(txq, get_desc, start_thrs)		\
> +	({								\
> +		int _res;						\
> +									\
> +		netif_tx_stop_queue(txq);				\
> +									\
> +		smp_mb();						\
> +									\
> +		/* We need to check again in a case another		\
> +		 * CPU has just made room available.			\
> +		 */							\
> +		if (likely(get_desc < start_thrs)) {			\
> +			_res = 0;					\
> +		} else {						\
> +			netif_tx_start_queue(txq);			\
> +			_res = -1;					\
> +		}							\
> +		_res;							\
> +	})								\
> +

The issue I see here is that with this being a macro it abstracts away
the relationship between get_desc and the memory barrier. At a minimum
I think we should be treating get_desc as a function instead of just
passing it and its arguments as a single value. Maybe something more
like how read_poll_timeout passes the "op" and then uses it as a
function with args passed seperately. What we want to avoid is passing
a precomuted value to this function as get_desc.

In addition I think I would prefer to see _res initialized to the
likely value so that we can drop this to one case instead of having to
have two. Same thing for the macros below.

> +/**
> + * netif_tx_queue_maybe_stop() - locklessly stop a Tx queue, if needed
> + * @txq:	struct netdev_queue to stop/start
> + * @get_desc:	get current number of free descriptors (see requirements below!)
> + * @stop_thrs:	minimal number of available descriptors for queue to be left
> + *		enabled
> + * @start_thrs:	minimal number of descriptors to re-enable the queue, can be
> + *		equal to @stop_thrs or higher to avoid frequent waking
> + *
> + * All arguments may be evaluated multiple times, beware of side effects.
> + * @get_desc must be a formula or a function call, it must always
> + * return up-to-date information when evaluated!
> + * Expected to be used from ndo_start_xmit, see the comment on top of the file.
> + *
> + * Returns:
> + *	 0 if the queue was stopped
> + *	 1 if the queue was left enabled
> + *	-1 if the queue was re-enabled (raced with waking)
> + */

We may want to change the values here. The most likely case is "left
enabled" with that being the case we probably want to make that the 0
case. I would then probably make 1 the re-enabled case and -1 the
stopped case.

With that the decision tree becomes more straightforward as we would do
something like:
	if (result) {
		if (result < 0)
			Increment stopped stat
			return
		else
			Increment restarted stat
	}

In addition for readability we may want consider adding an enum simliar
to the netdev_tx enum as then we have the return types locked and
usable should we want to specifically pick out one.


> +#define netif_tx_queue_maybe_stop(txq, get_desc, stop_thrs, start_thrs)	\
> +	({								\
> +		int _res;						\
> +									\
> +		if (likely(get_desc > stop_thrs))			\
> +			_res = 1;					\
> +		else							\
> +			_res = netif_tx_queue_try_stop(txq, get_desc,	\
> +						       start_thrs);	\
> +		_res;							\
> +	})								\
> +
> +#define __netif_tx_queue_try_wake(txq, get_desc, start_thrs, down_cond) \
> +	({								\
> +		int _res;						\
> +									\
> +		/* Make sure that anybody stopping the queue after	\
> +		 * this sees the new next_to_clean.			\
> +		 */							\
> +		smp_mb();						\
> +		if (netif_tx_queue_stopped(txq) && !(down_cond)) {	\
> +			netif_tx_wake_queue(txq);			\
> +			_res = 0;					\
> +		} else {						\
> +			_res = 1;					\
> +		}							\
> +		_res;							\
> +	})
> +
> +#define netif_tx_queue_try_wake(txq, get_desc, start_thrs)		\
> +	__netif_tx_queue_try_wake(txq, get_desc, start_thrs, false)
> +
> +/**
> + * __netif_tx_queue_maybe_wake() - locklessly wake a Tx queue, if needed
> + * @txq:	struct netdev_queue to stop/start
> + * @get_desc:	get current number of free descriptors (see requirements below!)
> + * @start_thrs:	minimal number of descriptors to re-enable the queue
> + * @down_cond:	down condition, predicate indicating that the queue should
> + *		not be woken up even if descriptors are available
> + *
> + * All arguments may be evaluated multiple times.
> + * @get_desc must be a formula or a function call, it must always
> + * return up-to-date information when evaluated!
> + *
> + * Returns:
> + *	 0 if the queue was woken up
> + *	 1 if the queue was already enabled (or disabled but @down_cond is true)
> + *	-1 if the queue was left stopped
> + */

I would go with the same here. The most common case should probably be
0 which would be "already enabled" with 1 being woken up and -1 being
stopped. In addition keeping the two consistent with each other would
allow for easier understanding of the two.

> +#define __netif_tx_queue_maybe_wake(txq, get_desc, start_thrs, down_cond) \
> +	({								\
> +		int _res;						\
> +									\
> +		if (likely(get_desc < start_thrs))			\
> +			_res = -1;					\
> +		else							\
> +			_res = __netif_tx_queue_try_wake(txq, get_desc,	\
> +							 start_thrs,	\
> +							 down_cond);	\
> +		_res;							\
> +	})
> +

The likely here is probably not correct. In most cases the queue will
likely have enough descriptors to enable waking since Tx cleanup can
usually run pretty fast compared to the transmit path itself since it
can run without needing to take locks.

> +#define netif_tx_queue_maybe_wake(txq, get_desc, start_thrs)		\
> +	__netif_tx_queue_maybe_wake(txq, get_desc, start_thrs, false)
> +
> +/* subqueue variants follow */
> +
> +#define netif_subqueue_try_stop(dev, idx, get_desc, start_thrs)		\
> +	({								\
> +		struct netdev_queue *txq;				\
> +									\
> +		txq = netdev_get_tx_queue(dev, idx);			\
> +		netif_tx_queue_try_stop(txq, get_desc, start_thrs);	\
> +	})
> +
> +#define netif_subqueue_maybe_stop(dev, idx, get_desc, stop_thrs, start_thrs) \
> +	({								\
> +		struct netdev_queue *txq;				\
> +									\
> +		txq = netdev_get_tx_queue(dev, idx);			\
> +		netif_tx_queue_maybe_stop(txq, get_desc,		\
> +					  stop_thrs, start_thrs);	\
> +	})
> +
> +#define __netif_subqueue_try_wake(dev, idx, get_desc, start_thrs, down_cond) \
> +	({								\
> +		struct netdev_queue *txq;				\
> +									\
> +		txq = netdev_get_tx_queue(dev, idx);			\
> +		__netif_tx_queue_try_wake(txq, get_desc,		\
> +					  start_thrs, down_cond);	\
> +	})
> +
> +#define netif_subqueue_try_wake(dev, idx, get_desc, start_thrs)		\
> +	__netif_subqueue_try_wake(dev, idx, get_desc, start_thrs, false)
> +
> +#define __netif_subqueue_maybe_wake(dev, idx, get_desc, start_thrs, down_cond) \
> +	({								\
> +		struct netdev_queue *txq;				\
> +									\
> +		txq = netdev_get_tx_queue(dev, idx);			\
> +		__netif_tx_queue_maybe_wake(txq, get_desc,		\
> +					    start_thrs, down_cond);	\
> +	})
> +
> +#define netif_subqueue_maybe_wake(dev, idx, get_desc, start_thrs)	\
> +	__netif_subqueue_maybe_wake(dev, idx, get_desc, start_thrs, false)
> +
> +#endif