[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f59ee83f-7267-04df-7286-f7ea147b5b49@nbd.name>
Date: Fri, 24 Mar 2023 18:57:03 +0100
From: Felix Fietkau <nbd@....name>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, Jonathan Corbet <corbet@....net>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next] net/core: add optional threading for backlog
processing
On 24.03.23 18:47, Jakub Kicinski wrote:
> On Fri, 24 Mar 2023 18:35:00 +0100 Felix Fietkau wrote:
>> I'm primarily testing this on routers with 2 or 4 CPUs and limited
>> processing power, handling routing/NAT. RPS is typically needed to
>> properly distribute the load across all available CPUs. When there is
>> only a small number of flows that are pushing a lot of traffic, a static
>> RPS assignment often leaves some CPUs idle, whereas others become a
>> bottleneck by being fully loaded. Threaded NAPI reduces this a bit, but
>> CPUs can become bottlenecked and fully loaded by a NAPI thread alone.
>
> The NAPI thread becomes a bottleneck with RPS enabled?
The devices that I work with often only have a single rx queue. That can
easily become a bottleneck.
>> Making backlog processing threaded helps split up the processing work
>> even more and distribute it onto remaining idle CPUs.
>
> You'd want to have both threaded NAPI and threaded backlog enabled?
Yes
>> It can basically be used to make RPS a bit more dynamic and
>> configurable, because you can assign multiple backlog threads to a set
>> of CPUs and selectively steer packets from specific devices / rx queues
>
> Can you give an example?
>
> With the 4 CPU example, in case 2 queues are very busy - you're trying
> to make sure that the RPS does not end up landing on the same CPU as
> the other busy queue?
In this part I'm thinking about bigger systems where you want to have a
group of CPUs dedicated to dealing with network traffic without
assigning a fixed function (e.g. NAPI processing or RPS target) to each
one, allowing for more dynamic processing.
>> to them and allow the scheduler to take care of the rest.
>
> You trust the scheduler much more than I do, I think :)
In my tests it brings down latency (both avg and p99) considerably in
some cases. I posted some numbers here:
https://lore.kernel.org/netdev/e317d5bc-cc26-8b1b-ca4b-66b5328683c4@nbd.name/
- Felix
Powered by blists - more mailing lists