netdev - Re: [PATCH net-next v2] gve: make IRQ handlers and page allocation NUMA aware

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250708133132.GL452973@horms.kernel.org>
Date: Tue, 8 Jul 2025 14:31:32 +0100
From: Simon Horman <horms@...nel.org>
To: Jeroen de Borst <jeroendb@...gle.com>
Cc: netdev@...r.kernel.org, hramamurthy@...gle.com, davem@...emloft.net,
	edumazet@...gle.com, kuba@...nel.org, willemb@...gle.com,
	pabeni@...hat.com, Bailey Forrest <bcf@...gle.com>,
	Joshua Washington <joshwash@...gle.com>
Subject: Re: [PATCH net-next v2] gve: make IRQ handlers and page allocation
 NUMA aware

On Mon, Jul 07, 2025 at 02:01:07PM -0700, Jeroen de Borst wrote:
> From: Bailey Forrest <bcf@...gle.com>
> 
> All memory in GVE is currently allocated without regard for the NUMA
> node of the device. Because access to NUMA-local memory access is
> significantly cheaper than access to a remote node, this change attempts
> to ensure that page frags used in the RX path, including page pool
> frags, are allocated on the NUMA node local to the gVNIC device. Note
> that this attempt is best-effort. If necessary, the driver will still
> allocate non-local memory, as __GFP_THISNODE is not passed. Descriptor
> ring allocations are not updated, as dma_alloc_coherent handles that.
> 
> This change also modifies the IRQ affinity setting to only select CPUs
> from the node local to the device, preserving the behavior that TX and
> RX queues of the same index share CPU affinity.
> 
> Signed-off-by: Bailey Forrest <bcf@...gle.com>
> Signed-off-by: Joshua Washington <joshwash@...gle.com>
> Reviewed-by: Willem de Bruijn <willemb@...gle.com>
> Signed-off-by: Harshitha Ramamurthy <hramamurthy@...gle.com>
> Signed-off-by: Jeroen de Borst <jeroendb@...gle.com>
> ---
> v1: https://lore.kernel.org/netdev/20250627183141.3781516-1-hramamurthy@google.com/
> v2:
> - Utilize kvcalloc_node instead of kvzalloc_node for array-type
>   allocations.

Thanks for the update.
I note that this addresses Jakub's review of v1.

I have a minor suggestion below, but I don't think it warrants
blocking progress of this patch.

Reviewed-by: Simon Horman <horms@...nel.org>

...

> diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c

...

> @@ -533,6 +540,8 @@ static int gve_alloc_notify_blocks(struct gve_priv *priv)
>  	}
>  
>  	/* Setup the other blocks - the first n-1 vectors */
> +	node_mask = gve_get_node_mask(priv);
> +	cur_cpu = cpumask_first(node_mask);
>  	for (i = 0; i < priv->num_ntfy_blks; i++) {
>  		struct gve_notify_block *block = &priv->ntfy_blocks[i];
>  		int msix_idx = i;
> @@ -549,9 +558,17 @@ static int gve_alloc_notify_blocks(struct gve_priv *priv)
>  			goto abort_with_some_ntfy_blocks;
>  		}
>  		block->irq = priv->msix_vectors[msix_idx].vector;
> -		irq_set_affinity_hint(priv->msix_vectors[msix_idx].vector,
> -				      get_cpu_mask(i % active_cpus));
> +		irq_set_affinity_and_hint(block->irq,
> +					  cpumask_of(cur_cpu));
>  		block->irq_db_index = &priv->irq_db_indices[i].index;
> +
> +		cur_cpu = cpumask_next(cur_cpu, node_mask);
> +		/* Wrap once CPUs in the node have been exhausted, or when
> +		 * starting RX queue affinities. TX and RX queues of the same
> +		 * index share affinity.
> +		 */
> +		if (cur_cpu >= nr_cpu_ids || (i + 1) == priv->tx_cfg.max_queues)
> +			cur_cpu = cpumask_first(node_mask);

FWIIW, maybe this can be written more succinctly as follows.
(Completely untested!)

		/* TX and RX queues of the same index share affinity. */
		if (i + 1 == priv->tx_cfg.max_queues)
			cur_cpu = cpumask_first(node_mask);
		else
			cur_cpu = cpumask_next_wrap(cur_cpu, node_mask);

...