lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230131143210.288223c5@kernel.org>
Date:   Tue, 31 Jan 2023 14:32:10 -0800
From:   Jakub Kicinski <kuba@...nel.org>
To:     peterz@...radead.org, tglx@...utronix.de
Cc:     jstultz@...gle.com, edumazet@...gle.com, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] softirq: avoid spurious stalls due to
 need_resched()

On Thu, 22 Dec 2022 14:12:43 -0800 Jakub Kicinski wrote:
> need_resched() added in commit c10d73671ad3 ("softirq: reduce latencies")
> does improve latency for real workloads (for example memcache).
> Unfortunately it triggers quite often even for non-network-heavy apps
> (~900 times a second on a loaded webserver), and in small fraction of
> cases whatever the scheduler decided to run will hold onto the CPU
> for the entire time slice.
> 
> 10ms+ stalls on a machine which is not actually under overload cause
> erratic network behavior and spurious TCP retransmits. Typical end-to-end
> latency in a datacenter is < 200us so its common to set TCP timeout
> to 10ms or less.
> 
> The intent of the need_resched() is to let a low latency application
> respond quickly and yield (to ksoftirqd). Put a time limit on this dance.
> Ignore the fact that ksoftirqd is RUNNING if we were trying to be nice
> and the application did not yield quickly.
> 
> On a webserver loaded at 90% CPU this change reduces the numer of 8ms+
> stalls the network softirq processing sees by around 10x (2/sec -> 0.2/sec).
> It also seems to reduce retransmissions by ~10% but the data is quite
> noisy.

Peter, is there a chance you could fold this patch into your ongoing
softirq rework? We can't both work on softirq in parallel, unfortunately
and this improvement is really key to counter balance whatever
heuristics CFS accumulated between 5.12 and 5.19 :(
Not to use the "r-word".

I can spin a version of this on top of your core/softirq branch, would
that work?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ