[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAO3-PbpX_Hzxy5aj-mppnipm2HE63oB-p51DAV7v9HvSNS9y6Q@mail.gmail.com>
Date: Tue, 15 Aug 2023 17:31:49 -0500
From: Yan Zhai <yan@...udflare.com>
To: Jesper Dangaard Brouer <hawk@...nel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>,
Wander Lairson Costa <wander@...hat.com>, kernel-team <kernel-team@...udflare.com>
Subject: Re: [RFC PATCH 2/2] softirq: Drop the warning from do_softirq_post_smp_call_flush().
On Tue, Aug 15, 2023 at 7:08 AM Jesper Dangaard Brouer <hawk@...nel.org> wrote:
>
>
>
> On 14/08/2023 11.35, Sebastian Andrzej Siewior wrote:
> > This is an undesired situation and it has been attempted to avoid the
> > situation in which ksoftirqd becomes scheduled. This changed since
> > commit d15121be74856 ("Revert "softirq: Let ksoftirqd do its job"")
> > and now a threaded interrupt handler will handle soft interrupts at its
> > end even if ksoftirqd is pending. That means that they will be processed
> > in the context in which they were raised.
>
> $ git describe --contains d15121be74856
> v6.5-rc1~232^2~4
>
> That revert basically removes the "overload" protection that was added
> to cope with DDoS situations in Aug 2016 (Cc. Cloudflare). As described
> in https://git.kernel.org/torvalds/c/4cd13c21b207 ("softirq: Let
> ksoftirqd do its job") in UDP overload situations when UDP socket
> receiver runs on same CPU as ksoftirqd it "falls-off-an-edge" and almost
> doesn't process packets (because softirq steals CPU/sched time from UDP
> pid). Warning Cloudflare (Cc) as this might affect their production
> use-cases, and I recommend getting involved to evaluate the effect of
> these changes.
>
> I do realize/acknowledge that the reverted patch caused other latency
> issues, given it was a "big-hammer" approach affecting other softirq
> processing (as can be seen by e.g. the watchdog fixes patches).
> Thus, the revert makes sense, but how to regain the "overload"
> protection such that RX networking cannot starve processes reading from
> the socket? (is this what Sebastian's patchset does?)
>
Thanks for notifying us. We will need to evaluate if this is going to
change the picture under serious floods.
Yan
> --Jesper
>
> Thread link for people Cc'ed:
> https://lore.kernel.org/all/20230814093528.117342-1-bigeasy@linutronix.de/#r
Powered by blists - more mailing lists