lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <751fd5bb13a49583b1593fa209bfabc4917290ae.camel@redhat.com> Date: Tue, 28 Mar 2023 11:29:24 +0200 From: Paolo Abeni <pabeni@...hat.com> To: Felix Fietkau <nbd@....name>, Jakub Kicinski <kuba@...nel.org> Cc: netdev@...r.kernel.org, Jonathan Corbet <corbet@....net>, "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org Subject: Re: [PATCH net-next] net/core: add optional threading for backlog processing On Fri, 2023-03-24 at 18:57 +0100, Felix Fietkau wrote: > On 24.03.23 18:47, Jakub Kicinski wrote: > > On Fri, 24 Mar 2023 18:35:00 +0100 Felix Fietkau wrote: > > > I'm primarily testing this on routers with 2 or 4 CPUs and limited > > > processing power, handling routing/NAT. RPS is typically needed to > > > properly distribute the load across all available CPUs. When there is > > > only a small number of flows that are pushing a lot of traffic, a static > > > RPS assignment often leaves some CPUs idle, whereas others become a > > > bottleneck by being fully loaded. Threaded NAPI reduces this a bit, but > > > CPUs can become bottlenecked and fully loaded by a NAPI thread alone. > > > > The NAPI thread becomes a bottleneck with RPS enabled? > > The devices that I work with often only have a single rx queue. That can > easily become a bottleneck. > > > > Making backlog processing threaded helps split up the processing work > > > even more and distribute it onto remaining idle CPUs. > > > > You'd want to have both threaded NAPI and threaded backlog enabled? > > Yes > > > > It can basically be used to make RPS a bit more dynamic and > > > configurable, because you can assign multiple backlog threads to a set > > > of CPUs and selectively steer packets from specific devices / rx queues > > > > Can you give an example? > > > > With the 4 CPU example, in case 2 queues are very busy - you're trying > > to make sure that the RPS does not end up landing on the same CPU as > > the other busy queue? > > In this part I'm thinking about bigger systems where you want to have a > group of CPUs dedicated to dealing with network traffic without > assigning a fixed function (e.g. NAPI processing or RPS target) to each > one, allowing for more dynamic processing. > > > > to them and allow the scheduler to take care of the rest. > > > > You trust the scheduler much more than I do, I think :) > > In my tests it brings down latency (both avg and p99) considerably in > some cases. I posted some numbers here: > https://lore.kernel.org/netdev/e317d5bc-cc26-8b1b-ca4b-66b5328683c4@nbd.name/ It's still not 110% clear to me why/how this additional thread could reduce latency. What/which threads are competing for the busy CPU[s]? I suspect it could be easier/cleaner move away the others (non RPS) threads. Cheers, Paolo
Powered by blists - more mailing lists