linux-kernel - Re: [PATCH net-next] net/core: add optional threading for backlog processing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230324104733.571466bc@kernel.org>
Date:   Fri, 24 Mar 2023 10:47:33 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Felix Fietkau <nbd@....name>
Cc:     netdev@...r.kernel.org, Jonathan Corbet <corbet@....net>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Paolo Abeni <pabeni@...hat.com>, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next] net/core: add optional threading for backlog
 processing

On Fri, 24 Mar 2023 18:35:00 +0100 Felix Fietkau wrote:
> I'm primarily testing this on routers with 2 or 4 CPUs and limited 
> processing power, handling routing/NAT. RPS is typically needed to 
> properly distribute the load across all available CPUs. When there is 
> only a small number of flows that are pushing a lot of traffic, a static 
> RPS assignment often leaves some CPUs idle, whereas others become a 
> bottleneck by being fully loaded. Threaded NAPI reduces this a bit, but 
> CPUs can become bottlenecked and fully loaded by a NAPI thread alone.

The NAPI thread becomes a bottleneck with RPS enabled?

> Making backlog processing threaded helps split up the processing work 
> even more and distribute it onto remaining idle CPUs.

You'd want to have both threaded NAPI and threaded backlog enabled?

> It can basically be used to make RPS a bit more dynamic and 
> configurable, because you can assign multiple backlog threads to a set 
> of CPUs and selectively steer packets from specific devices / rx queues 

Can you give an example?

With the 4 CPU example, in case 2 queues are very busy - you're trying
to make sure that the RPS does not end up landing on the same CPU as
the other busy queue?

> to them and allow the scheduler to take care of the rest.

You trust the scheduler much more than I do, I think :)