netdev - Re: [PATCH net-next resend 0/2] sfc: optimize RXQs count and affinities

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220218211911.432f3811@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date:   Fri, 18 Feb 2022 21:19:11 -0800
From:   Jakub Kicinski <kuba@...nel.org>
To:     ecree.xilinx@...il.com, habetsm.xilinx@...il.com
Cc:     Íñigo Huguet <ihuguet@...hat.com>,
        davem@...emloft.net, netdev@...r.kernel.org
Subject: Re: [PATCH net-next resend 0/2] sfc: optimize RXQs count and
 affinities

On Wed, 16 Feb 2022 10:41:37 +0100 Íñigo Huguet wrote:
> In sfc driver one RX queue per physical core was allocated by default.
> Later on, IRQ affinities were set spreading the IRQs in all NUMA local
> CPUs.
> 
> However, with that default configuration it result in a non very optimal
> configuration in many modern systems. Specifically, in systems with hyper
> threading and 2 NUMA nodes, affinities are set in a way that IRQs are
> handled by all logical cores of one same NUMA node. Handling IRQs from
> both hyper threading siblings has no benefit, and setting affinities to one
> queue per physical core is neither a very good idea because there is a
> performance penalty for moving data across nodes (I was able to check it
> with some XDP tests using pktgen).
> 
> This patches reduce the default number of channels to one per physical
> core in the local NUMA node. Then, they set IRQ affinities to CPUs in
> the local NUMA node only. This way we save hardware resources since
> channels are limited resources. We also leave more room for XDP_TX
> channels without hitting driver's limit of 32 channels per interface.
> 
> Running performance tests using iperf with a SFC9140 device showed no
> performance penalty for reducing the number of channels.
> 
> RX XDP tests showed that performance can go down to less than half if
> the IRQ is handled by a CPU in a different NUMA node, which doesn't
> happen with the new defaults from this patches.

Martin, Ed, any thoughts?