[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <25de7655-6084-e6b9-1af6-c47b3d3b7dc1@kernel.org>
Date: Tue, 15 Aug 2023 14:08:42 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Cc: hawk@...nel.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Wander Lairson Costa <wander@...hat.com>,
kernel-team <kernel-team@...udflare.com>, Yan Zhai <yan@...udflare.com>
Subject: Re: [RFC PATCH 2/2] softirq: Drop the warning from
do_softirq_post_smp_call_flush().
On 14/08/2023 11.35, Sebastian Andrzej Siewior wrote:
> This is an undesired situation and it has been attempted to avoid the
> situation in which ksoftirqd becomes scheduled. This changed since
> commit d15121be74856 ("Revert "softirq: Let ksoftirqd do its job"")
> and now a threaded interrupt handler will handle soft interrupts at its
> end even if ksoftirqd is pending. That means that they will be processed
> in the context in which they were raised.
$ git describe --contains d15121be74856
v6.5-rc1~232^2~4
That revert basically removes the "overload" protection that was added
to cope with DDoS situations in Aug 2016 (Cc. Cloudflare). As described
in https://git.kernel.org/torvalds/c/4cd13c21b207 ("softirq: Let
ksoftirqd do its job") in UDP overload situations when UDP socket
receiver runs on same CPU as ksoftirqd it "falls-off-an-edge" and almost
doesn't process packets (because softirq steals CPU/sched time from UDP
pid). Warning Cloudflare (Cc) as this might affect their production
use-cases, and I recommend getting involved to evaluate the effect of
these changes.
I do realize/acknowledge that the reverted patch caused other latency
issues, given it was a "big-hammer" approach affecting other softirq
processing (as can be seen by e.g. the watchdog fixes patches).
Thus, the revert makes sense, but how to regain the "overload"
protection such that RX networking cannot starve processes reading from
the socket? (is this what Sebastian's patchset does?)
--Jesper
Thread link for people Cc'ed:
https://lore.kernel.org/all/20230814093528.117342-1-bigeasy@linutronix.de/#r
Powered by blists - more mailing lists