netdev - Re: [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f250fc62-a4a6-6543-d688-e755729a7291@gmail.com>
Date:   Mon, 24 Oct 2022 14:24:58 +0300
From:   Tariq Toukan <ttoukan.linux@...il.com>
To:     Valentin Schneider <vschneid@...hat.com>, netdev@...r.kernel.org,
        linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org
Cc:     Tariq Toukan <tariqt@...dia.com>,
        Saeed Mahameed <saeedm@...dia.com>,
        Leon Romanovsky <leon@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Yury Norov <yury.norov@...il.com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Rasmus Villemoes <linux@...musvillemoes.dk>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mel Gorman <mgorman@...e.de>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Heiko Carstens <hca@...ux.ibm.com>,
        Tony Luck <tony.luck@...el.com>,
        Jonathan Cameron <Jonathan.Cameron@...wei.com>,
        Gal Pressman <gal@...dia.com>,
        Jesse Brandeburg <jesse.brandeburg@...el.com>
Subject: Re: [PATCH v5 3/3] net/mlx5e: Improve remote NUMA preferences used
 for the IRQ affinity hints



On 10/21/2022 3:19 PM, Valentin Schneider wrote:
> From: Tariq Toukan <tariqt@...dia.com>
> 
> In the IRQ affinity hints, replace the binary NUMA preference (local /
> remote) with the improved for_each_numa_hop_cpu() API that minds the
> actual distances, so that remote NUMAs with short distance are preferred
> over farther ones.
> 
> This has significant performance implications when using NUMA-aware
> allocated memory (follow [1] and derivatives for example).
> 
> [1]
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c :: mlx5e_open_channel()
>     int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
> 
> Performance tests:
> 
> TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on).
> Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121
> 
> +-------------------------+-----------+------------------+------------------+
> |                         | BW (Gbps) | TX side CPU util | RX side CPU util |
> +-------------------------+-----------+------------------+------------------+
> | Baseline                | 52.3      | 6.4 %            | 17.9 %           |
> +-------------------------+-----------+------------------+------------------+
> | Applied on TX side only | 52.6      | 5.2 %            | 18.5 %           |
> +-------------------------+-----------+------------------+------------------+
> | Applied on RX side only | 94.9      | 11.9 %           | 27.2 %           |
> +-------------------------+-----------+------------------+------------------+
> | Applied on both sides   | 95.1      | 8.4 %            | 27.3 %           |
> +-------------------------+-----------+------------------+------------------+
> 
> Bottleneck in RX side is released, reached linerate (~1.8x speedup).
> ~30% less cpu util on TX.
> 
> * CPU util on active cores only.
> 
> Setups details (similar for both sides):
> 
> NIC: ConnectX6-DX dual port, 100 Gbps each.
> Single port used in the tests.
> 
> $ lscpu
> Architecture:        x86_64
> CPU op-mode(s):      32-bit, 64-bit
> Byte Order:          Little Endian
> CPU(s):              256
> On-line CPU(s) list: 0-255
> Thread(s) per core:  2
> Core(s) per socket:  64
> Socket(s):           2
> NUMA node(s):        16
> Vendor ID:           AuthenticAMD
> CPU family:          25
> Model:               1
> Model name:          AMD EPYC 7763 64-Core Processor
> Stepping:            1
> CPU MHz:             2594.804
> BogoMIPS:            4890.73
> Virtualization:      AMD-V
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            512K
> L3 cache:            32768K
> NUMA node0 CPU(s):   0-7,128-135
> NUMA node1 CPU(s):   8-15,136-143
> NUMA node2 CPU(s):   16-23,144-151
> NUMA node3 CPU(s):   24-31,152-159
> NUMA node4 CPU(s):   32-39,160-167
> NUMA node5 CPU(s):   40-47,168-175
> NUMA node6 CPU(s):   48-55,176-183
> NUMA node7 CPU(s):   56-63,184-191
> NUMA node8 CPU(s):   64-71,192-199
> NUMA node9 CPU(s):   72-79,200-207
> NUMA node10 CPU(s):  80-87,208-215
> NUMA node11 CPU(s):  88-95,216-223
> NUMA node12 CPU(s):  96-103,224-231
> NUMA node13 CPU(s):  104-111,232-239
> NUMA node14 CPU(s):  112-119,240-247
> NUMA node15 CPU(s):  120-127,248-255
> ..
...
> 
> Signed-off-by: Tariq Toukan <tariqt@...dia.com>
> [Tweaked API use]

Thanks for your modification.
It looks good to me.

Signed-off-by: Tariq Toukan <tariqt@...dia.com>

> Signed-off-by: Valentin Schneider <vschneid@...hat.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 ++++++++++++++++--
>   1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index a0242dc15741c..7acbeb3d51846 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -812,9 +812,12 @@ static void comp_irqs_release(struct mlx5_core_dev *dev)
>   static int comp_irqs_request(struct mlx5_core_dev *dev)
>   {
>   	struct mlx5_eq_table *table = dev->priv.eq_table;
> +	const struct cpumask *prev = cpu_none_mask;
> +	const struct cpumask *mask;
>   	int ncomp_eqs = table->num_comp_eqs;
>   	u16 *cpus;
>   	int ret;
> +	int cpu;
>   	int i;
>   
>   	ncomp_eqs = table->num_comp_eqs;
> @@ -833,8 +836,19 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
>   		ret = -ENOMEM;
>   		goto free_irqs;
>   	}
> -	for (i = 0; i < ncomp_eqs; i++)
> -		cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
> +
> +	i = 0;
> +	rcu_read_lock();
> +	for_each_numa_hop_mask(mask, dev->priv.numa_node) {
> +		for_each_cpu_andnot(cpu, mask, prev) {
> +			cpus[i] = cpu;
> +			if (++i == ncomp_eqs)
> +				goto spread_done;
> +		}
> +		prev = mask;
> +	}
> +spread_done:
> +	rcu_read_unlock();
>   	ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs);
>   	kfree(cpus);
>   	if (ret < 0)