lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dd41a7d6-892f-47a9-8ac7-80d7ac8b5d7f@intel.com>
Date: Thu, 2 Jan 2025 13:16:24 -0800
From: "Samudrala, Sridhar" <sridhar.samudrala@...el.com>
To: Samiullah Khawaja <skhawaja@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
	"David S . Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
	Paolo Abeni <pabeni@...hat.com>
CC: <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next 3/3] Extend napi threaded polling to allow
 kthread based busy polling



On 1/2/2025 11:12 AM, Samiullah Khawaja wrote:
> Add a new state to napi state enum:
> 
> - STATE_THREADED_BUSY_POLL
>    Threaded busy poll is enabled/running for this napi.
> 
> Following changes are introduced in the napi scheduling and state logic:
> 
> - When threaded busy poll is enabled through sysfs it also enables
>    NAPI_STATE_THREADED so a kthread is created per napi. It also sets
>    NAPI_STATE_THREADED_BUSY_POLL bit on each napi to indicate that we are
>    supposed to busy poll for each napi.

Looks like this patch is changing the sysfs 'threaded' field from 
boolean to an integer and value 2 is used to indicate threaded mode with 
busypoll.
So I think the above comment should reflect that instead of just saying 
enabled for both threaded and busypoll.

> 
> - When napi is scheduled with STATE_SCHED_THREADED and associated
>    kthread is woken up, the kthread owns the context. If
>    NAPI_STATE_THREADED_BUSY_POLL and NAPI_SCHED_THREADED both are set
>    then it means that we can busy poll.
> 
> - To keep busy polling and to avoid scheduling of the interrupts, the
>    napi_complete_done returns false when both SCHED_THREADED and
>    THREADED_BUSY_POLL flags are set. Also napi_complete_done returns
>    early to avoid the STATE_SCHED_THREADED being unset.
> 
> - If at any point STATE_THREADED_BUSY_POLL is unset, the
>    napi_complete_done will run and unset the SCHED_THREADED bit also.
>    This will make the associated kthread go to sleep as per existing
>    logic.

When does STATE_THREADED_BUSY_POLL get unset? Don't we need a timeout 
value to come out of busypoll mode if there is no traffic?

> 
> Signed-off-by: Samiullah Khawaja <skhawaja@...gle.com>
> Reviewed-by: Willem de Bruijn <willemb@...gle.com>
> ---
>   Documentation/ABI/testing/sysfs-class-net     |  3 +-
>   Documentation/netlink/specs/netdev.yaml       |  5 +-
>   .../net/ethernet/atheros/atl1c/atl1c_main.c   |  2 +-
>   include/linux/netdevice.h                     | 24 +++++--
>   net/core/dev.c                                | 72 ++++++++++++++++---
>   net/core/net-sysfs.c                          |  2 +-
>   net/core/netdev-genl-gen.c                    |  2 +-
>   7 files changed, 89 insertions(+), 21 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net
> index ebf21beba846..15d7d36a8294 100644
> --- a/Documentation/ABI/testing/sysfs-class-net
> +++ b/Documentation/ABI/testing/sysfs-class-net
> @@ -343,7 +343,7 @@ Date:		Jan 2021
>   KernelVersion:	5.12
>   Contact:	netdev@...r.kernel.org
>   Description:
> -		Boolean value to control the threaded mode per device. User could
> +		Integer value to control the threaded mode per device. User could
>   		set this value to enable/disable threaded mode for all napi
>   		belonging to this device, without the need to do device up/down.
>   
> @@ -351,4 +351,5 @@ Description:
>   		== ==================================
>   		0  threaded mode disabled for this dev
>   		1  threaded mode enabled for this dev
> +		2  threaded mode enabled, and busy polling enabled.
>   		== ==================================
> diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
> index aac343af7246..9c905243a1cc 100644
> --- a/Documentation/netlink/specs/netdev.yaml
> +++ b/Documentation/netlink/specs/netdev.yaml
> @@ -272,10 +272,11 @@ attribute-sets:
>           name: threaded
>           doc: Whether the napi is configured to operate in threaded polling
>                mode. If this is set to `1` then the NAPI context operates
> -             in threaded polling mode.
> +             in threaded polling mode. If this is set to `2` then the NAPI
> +             kthread also does busypolling.
>           type: u32
>           checks:
> -          max: 1
> +          max: 2
>     -
>       name: queue
>       attributes:
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index c571614b1d50..a709cddcd292 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -2688,7 +2688,7 @@ static int atl1c_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>   	adapter->mii.mdio_write = atl1c_mdio_write;
>   	adapter->mii.phy_id_mask = 0x1f;
>   	adapter->mii.reg_num_mask = MDIO_CTRL_REG_MASK;
> -	dev_set_threaded(netdev, true);
> +	dev_set_threaded(netdev, DEV_NAPI_THREADED);
>   	for (i = 0; i < adapter->rx_queue_count; ++i)
>   		netif_napi_add(netdev, &adapter->rrd_ring[i].napi,
>   			       atl1c_clean_rx);
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 8f531d528869..c384ffe0976e 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -407,6 +407,8 @@ enum {
>   	NAPI_STATE_PREFER_BUSY_POLL,	/* prefer busy-polling over softirq processing*/
>   	NAPI_STATE_THREADED,		/* The poll is performed inside its own thread*/
>   	NAPI_STATE_SCHED_THREADED,	/* Napi is currently scheduled in threaded mode */
> +	NAPI_STATE_THREADED_BUSY_POLL,	/* The threaded napi poller will busy poll */
> +	NAPI_STATE_SCHED_THREADED_BUSY_POLL,  /* The threaded napi poller is busy polling */
>   };
>   
>   enum {
> @@ -420,8 +422,14 @@ enum {
>   	NAPIF_STATE_PREFER_BUSY_POLL	= BIT(NAPI_STATE_PREFER_BUSY_POLL),
>   	NAPIF_STATE_THREADED		= BIT(NAPI_STATE_THREADED),
>   	NAPIF_STATE_SCHED_THREADED	= BIT(NAPI_STATE_SCHED_THREADED),
> +	NAPIF_STATE_THREADED_BUSY_POLL	= BIT(NAPI_STATE_THREADED_BUSY_POLL),
> +	NAPIF_STATE_SCHED_THREADED_BUSY_POLL
> +				= BIT(NAPI_STATE_SCHED_THREADED_BUSY_POLL),
>   };
>   
> +#define NAPIF_STATE_THREADED_BUSY_POLL_MASK \
> +	(NAPIF_STATE_THREADED | NAPIF_STATE_THREADED_BUSY_POLL)
> +
>   enum gro_result {
>   	GRO_MERGED,
>   	GRO_MERGED_FREE,
> @@ -568,16 +576,24 @@ static inline bool napi_complete(struct napi_struct *n)
>   	return napi_complete_done(n, 0);
>   }
>   
> -int dev_set_threaded(struct net_device *dev, bool threaded);
> +enum napi_threaded_state {
> +	NAPI_THREADED_OFF = 0,
> +	NAPI_THREADED = 1,
> +	NAPI_THREADED_BUSY_POLL = 2,
> +	NAPI_THREADED_MAX = NAPI_THREADED_BUSY_POLL,
> +};
> +
> +int dev_set_threaded(struct net_device *dev, enum napi_threaded_state threaded);
>   
>   /*
>    * napi_set_threaded - set napi threaded state
>    * @napi: NAPI context
> - * @threaded: whether this napi does threaded polling
> + * @threaded: threading mode
>    *
>    * Return 0 on success and negative errno on failure.
>    */
> -int napi_set_threaded(struct napi_struct *napi, bool threaded);
> +int napi_set_threaded(struct napi_struct *napi,
> +		      enum napi_threaded_state threaded);
>   
>   /**
>    *	napi_disable - prevent NAPI from scheduling
> @@ -2406,7 +2422,7 @@ struct net_device {
>   	struct sfp_bus		*sfp_bus;
>   	struct lock_class_key	*qdisc_tx_busylock;
>   	bool			proto_down;
> -	bool			threaded;
> +	u8			threaded;
>   
>   	/* priv_flags_slow, ungrouped to save space */
>   	unsigned long		see_all_hwtstamp_requests:1;
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 762977a62da2..b6cd9474bdd3 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -78,6 +78,7 @@
>   #include <linux/slab.h>
>   #include <linux/sched.h>
>   #include <linux/sched/isolation.h>
> +#include <linux/sched/types.h>
>   #include <linux/sched/mm.h>
>   #include <linux/smpboot.h>
>   #include <linux/mutex.h>
> @@ -6231,7 +6232,8 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
>   	 *    the guarantee we will be called later.
>   	 */
>   	if (unlikely(n->state & (NAPIF_STATE_NPSVC |
> -				 NAPIF_STATE_IN_BUSY_POLL)))
> +				 NAPIF_STATE_IN_BUSY_POLL |
> +				 NAPIF_STATE_SCHED_THREADED_BUSY_POLL)))
>   		return false;
>   
>   	if (work_done) {
> @@ -6633,8 +6635,10 @@ static void init_gro_hash(struct napi_struct *napi)
>   	napi->gro_bitmask = 0;
>   }
>   
> -int napi_set_threaded(struct napi_struct *napi, bool threaded)
> +int napi_set_threaded(struct napi_struct *napi,
> +		      enum napi_threaded_state threaded)
>   {
> +	unsigned long val;
>   	if (napi->dev->threaded)
>   		return -EINVAL;
>   
> @@ -6649,30 +6653,41 @@ int napi_set_threaded(struct napi_struct *napi, bool threaded)
>   
>   	/* Make sure kthread is created before THREADED bit is set. */
>   	smp_mb__before_atomic();
> -	assign_bit(NAPI_STATE_THREADED, &napi->state, threaded);
> +	val = 0;
> +	if (threaded == NAPI_THREADED_BUSY_POLL)
> +		val |= NAPIF_STATE_THREADED_BUSY_POLL;
> +	if (threaded)
> +		val |= NAPIF_STATE_THREADED;
> +	set_mask_bits(&napi->state, NAPIF_STATE_THREADED_BUSY_POLL_MASK, val);
>   
>   	return 0;
>   }
>   
> -int dev_set_threaded(struct net_device *dev, bool threaded)
> +int dev_set_threaded(struct net_device *dev, enum napi_threaded_state threaded)
>   {
>   	struct napi_struct *napi;
> +	unsigned long val;
>   	int err = 0;
>   
>   	if (dev->threaded == threaded)
>   		return 0;
>   
> +	val = 0;
>   	if (threaded) {
>   		/* Check if threaded is set at napi level already */
>   		list_for_each_entry(napi, &dev->napi_list, dev_list)
>   			if (test_bit(NAPI_STATE_THREADED, &napi->state))
>   				return -EINVAL;
>   
> +		val |= NAPIF_STATE_THREADED;
> +		if (threaded == NAPI_THREADED_BUSY_POLL)
> +			val |= NAPIF_STATE_THREADED_BUSY_POLL;
> +
>   		list_for_each_entry(napi, &dev->napi_list, dev_list) {
>   			if (!napi->thread) {
>   				err = napi_kthread_create(napi);
>   				if (err) {
> -					threaded = false;
> +					threaded = NAPI_THREADED_OFF;
>   					break;
>   				}
>   			}
> @@ -6691,9 +6706,13 @@ int dev_set_threaded(struct net_device *dev, bool threaded)
>   	 * polled. In this case, the switch between threaded mode and
>   	 * softirq mode will happen in the next round of napi_schedule().
>   	 * This should not cause hiccups/stalls to the live traffic.
> +	 *
> +	 * Switch to busy_poll threaded napi will occur after the threaded
> +	 * napi is scheduled.
>   	 */
>   	list_for_each_entry(napi, &dev->napi_list, dev_list)
> -		assign_bit(NAPI_STATE_THREADED, &napi->state, threaded);
> +		set_mask_bits(&napi->state,
> +			      NAPIF_STATE_THREADED_BUSY_POLL_MASK, val);
>   
>   	return err;
>   }
> @@ -7007,7 +7026,7 @@ static int napi_thread_wait(struct napi_struct *napi)
>   	return -1;
>   }
>   
> -static void napi_threaded_poll_loop(struct napi_struct *napi)
> +static void napi_threaded_poll_loop(struct napi_struct *napi, bool busy_poll)
>   {
>   	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
>   	struct softnet_data *sd;
> @@ -7036,22 +7055,53 @@ static void napi_threaded_poll_loop(struct napi_struct *napi)
>   		}
>   		skb_defer_free_flush(sd);
>   		bpf_net_ctx_clear(bpf_net_ctx);
> +
> +		/* Push the skbs up the stack if busy polling. */
> +		if (busy_poll)
> +			__napi_gro_flush_helper(napi);
>   		local_bh_enable();
>   
> -		if (!repoll)
> +		/* If busy polling then do not break here because we need to
> +		 * call cond_resched and rcu_softirq_qs_periodic to prevent
> +		 * watchdog warnings.
> +		 */
> +		if (!repoll && !busy_poll)
>   			break;
>   
>   		rcu_softirq_qs_periodic(last_qs);
>   		cond_resched();
> +
> +		if (!repoll)
> +			break;
>   	}
>   }
>   
>   static int napi_threaded_poll(void *data)
>   {
>   	struct napi_struct *napi = data;
> +	bool busy_poll_sched;
> +	unsigned long val;
> +	bool busy_poll;
> +
> +	while (!napi_thread_wait(napi)) {
> +		/* Once woken up, this means that we are scheduled as threaded
> +		 * napi and this thread owns the napi context, if busy poll
> +		 * state is set then we busy poll this napi.
> +		 */
> +		val = READ_ONCE(napi->state);
> +		busy_poll = val & NAPIF_STATE_THREADED_BUSY_POLL;
> +		busy_poll_sched = val & NAPIF_STATE_SCHED_THREADED_BUSY_POLL;
> +
> +		/* Do not busy poll if napi is disabled. */
> +		if (unlikely(val & NAPIF_STATE_DISABLE))
> +			busy_poll = false;
> +
> +		if (busy_poll != busy_poll_sched)
> +			assign_bit(NAPI_STATE_SCHED_THREADED_BUSY_POLL,
> +				   &napi->state, busy_poll);
>   
> -	while (!napi_thread_wait(napi))
> -		napi_threaded_poll_loop(napi);
> +		napi_threaded_poll_loop(napi, busy_poll);
> +	}
>   
>   	return 0;
>   }
> @@ -12205,7 +12255,7 @@ static void run_backlog_napi(unsigned int cpu)
>   {
>   	struct softnet_data *sd = per_cpu_ptr(&softnet_data, cpu);
>   
> -	napi_threaded_poll_loop(&sd->backlog);
> +	napi_threaded_poll_loop(&sd->backlog, false);
>   }
>   
>   static void backlog_napi_setup(unsigned int cpu)
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index 2d9afc6e2161..36d0a22e341c 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -626,7 +626,7 @@ static int modify_napi_threaded(struct net_device *dev, unsigned long val)
>   	if (list_empty(&dev->napi_list))
>   		return -EOPNOTSUPP;
>   
> -	if (val != 0 && val != 1)
> +	if (val > NAPI_THREADED_MAX)
>   		return -EOPNOTSUPP;
>   
>   	ret = dev_set_threaded(dev, val);
> diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c
> index 93dc74dad6de..4086d2577dcc 100644
> --- a/net/core/netdev-genl-gen.c
> +++ b/net/core/netdev-genl-gen.c
> @@ -102,7 +102,7 @@ static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_IRQ_SUSPE
>   /* NETDEV_CMD_NAPI_SET_THREADED - do */
>   static const struct nla_policy netdev_napi_set_threaded_nl_policy[NETDEV_A_NAPI_THREADED + 1] = {
>   	[NETDEV_A_NAPI_ID] = { .type = NLA_U32, },
> -	[NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 1),
> +	[NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 2),
>   };
>   
>   /* Ops table for netdev */


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ