lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2d251879-1cf4-237d-8e62-c42bb4feb047@nbd.name>
Date:   Fri, 24 Mar 2023 18:35:00 +0100
From:   Felix Fietkau <nbd@....name>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     netdev@...r.kernel.org, Jonathan Corbet <corbet@....net>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Paolo Abeni <pabeni@...hat.com>, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next] net/core: add optional threading for backlog
 processing

On 24.03.23 18:20, Jakub Kicinski wrote:
> On Fri, 24 Mar 2023 18:13:14 +0100 Felix Fietkau wrote:
>> When dealing with few flows or an imbalance on CPU utilization, static RPS
>> CPU assignment can be too inflexible. Add support for enabling threaded NAPI
>> for backlog processing in order to allow the scheduler to better balance
>> processing. This helps better spread the load across idle CPUs.
> 
> Can you explain the use case a little bit more?

I'm primarily testing this on routers with 2 or 4 CPUs and limited 
processing power, handling routing/NAT. RPS is typically needed to 
properly distribute the load across all available CPUs. When there is 
only a small number of flows that are pushing a lot of traffic, a static 
RPS assignment often leaves some CPUs idle, whereas others become a 
bottleneck by being fully loaded. Threaded NAPI reduces this a bit, but 
CPUs can become bottlenecked and fully loaded by a NAPI thread alone.

Making backlog processing threaded helps split up the processing work 
even more and distribute it onto remaining idle CPUs.

It can basically be used to make RPS a bit more dynamic and 
configurable, because you can assign multiple backlog threads to a set 
of CPUs and selectively steer packets from specific devices / rx queues 
to them and allow the scheduler to take care of the rest.

- Felix

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ