lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240813171015.425f239e@kernel.org>
Date: Tue, 13 Aug 2024 17:10:15 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Martin Karsten <mkarsten@...terloo.ca>
Cc: Stanislav Fomichev <sdf@...ichev.me>, netdev@...r.kernel.org, Joe Damato
 <jdamato@...tly.com>, amritha.nambiar@...el.com,
 sridhar.samudrala@...el.com, Alexander Lobakin
 <aleksander.lobakin@...el.com>, Alexander Viro <viro@...iv.linux.org.uk>,
 Breno Leitao <leitao@...ian.org>, Christian Brauner <brauner@...nel.org>,
 Daniel Borkmann <daniel@...earbox.net>, "David S. Miller"
 <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jan Kara
 <jack@...e.cz>, Jiri Pirko <jiri@...nulli.us>, Johannes Berg
 <johannes.berg@...el.com>, Jonathan Corbet <corbet@....net>, "open
 list:DOCUMENTATION" <linux-doc@...r.kernel.org>, "open list:FILESYSTEMS
 (VFS and infrastructure)" <linux-fsdevel@...r.kernel.org>, open list
 <linux-kernel@...r.kernel.org>, Lorenzo Bianconi <lorenzo@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, Sebastian Andrzej Siewior
 <bigeasy@...utronix.de>
Subject: Re: [RFC net-next 0/5] Suspend IRQs during preferred busy poll

On Mon, 12 Aug 2024 17:46:42 -0400 Martin Karsten wrote:
> >> Here's how it is intended to work:
> >>    - An administrator sets the existing sysfs parameters for
> >>      defer_hard_irqs and gro_flush_timeout to enable IRQ deferral.
> >>
> >>    - An administrator sets the new sysfs parameter irq_suspend_timeout
> >>      to a larger value than gro-timeout to enable IRQ suspension.  
> > 
> > Can you expand more on what's the problem with the existing gro_flush_timeout?
> > Is it defer_hard_irqs_count? Or you want a separate timeout only for the
> > perfer_busy_poll case(why?)? Because looking at the first two patches,
> > you essentially replace all usages of gro_flush_timeout with a new variable
> > and I don't see how it helps.  
> 
> gro-flush-timeout (in combination with defer-hard-irqs) is the default 
> irq deferral mechanism and as such, always active when configured. Its 
> static periodic softirq processing leads to a situation where:
> 
> - A long gro-flush-timeout causes high latencies when load is 
> sufficiently below capacity, or
> 
> - a short gro-flush-timeout causes overhead when softirq execution 
> asynchronously competes with application processing at high load.
> 
> The shortcomings of this are documented (to some extent) by our 
> experiments. See defer20 working well at low load, but having problems 
> at high load, while defer200 having higher latency at low load.
> 
> irq-suspend-timeout is only active when an application uses 
> prefer-busy-polling and in that case, produces a nice alternating 
> pattern of application processing and networking processing (similar to 
> what we describe in the paper). This then works well with both low and 
> high load.

What about NIC interrupt coalescing. defer_hard_irqs_count was supposed
to be used with NICs which either don't have IRQ coalescing or have a
broken implementation. The timeout of 200usec should be perfectly within
range of what NICs can support.

If the NIC IRQ coalescing works, instead of adding a new timeout value
we could add a new deferral control (replacing defer_hard_irqs_count)
which would always kick in after seeing prefer_busy_poll() but also
not kick in if the busy poll harvested 0 packets.

> > Maybe expand more on what code paths are we trying to improve? Existing
> > busy polling code is not super readable, so would be nice to simplify
> > it a bit in the process (if possible) instead of adding one more tunable.  
> 
> There are essentially three possible loops for network processing:
> 
> 1) hardirq -> softirq -> napi poll; this is the baseline functionality
> 
> 2) timer -> softirq -> napi poll; this is deferred irq processing scheme 
> with the shortcomings described above
> 
> 3) epoll -> busy-poll -> napi poll
> 
> If a system is configured for 1), not much can be done, as it is 
> difficult to interject anything into this loop without adding state and 
> side effects. This is what we tried for the paper, but it ended up being 
> a hack.
> 
> If however the system is configured for irq deferral, Loops 2) and 3) 
> "wrestle" with each other for control. Injecting the larger 
> irq-suspend-timeout for 'timer' in Loop 2) essentially tilts this in 
> favour of Loop 3) and creates the nice pattern describe above.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ