linux-kernel - Re: [PATCH 0/3] softirq: uncontroversial change

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iKQ2KR23Ln9FU5RCKH89KWCNcu9QWuVLB4CcEqgoH+iRQ@mail.gmail.com>
Date:   Thu, 20 Apr 2023 19:41:57 +0200
From:   Eric Dumazet <edumazet@...gle.com>
To:     Paolo Abeni <pabeni@...hat.com>
Cc:     Jakub Kicinski <kuba@...nel.org>, peterz@...radead.org,
        tglx@...utronix.de, jstultz@...gle.com, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/3] softirq: uncontroversial change

On Thu, Apr 20, 2023 at 7:24 PM Paolo Abeni <pabeni@...hat.com> wrote:
>
> Hi all,
> On Thu, 2022-12-22 at 14:12 -0800, Jakub Kicinski wrote:
> > Catching up on LWN I run across the article about softirq
> > changes, and then I noticed fresh patches in Peter's tree.
> > So probably wise for me to throw these out there.
> >
> > My (can I say Meta's?) problem is the opposite to what the RT
> > sensitive people complain about. In the current scheme once
> > ksoftirqd is woken no network processing happens until it runs.
> >
> > When networking gets overloaded - that's probably fair, the problem
> > is that we confuse latency tweaks with overload protection. We have
> > a needs_resched() in the loop condition (which is a latency tweak)
> > Most often we defer to ksoftirqd because we're trying to be nice
> > and let user space respond quickly, not because there is an
> > overload. But the user space may not be nice, and sit on the CPU
> > for 10ms+. Also the sirq's "work allowance" is 2ms, which is
> > uncomfortably close to the timer tick, but that's another story.
> >
> > We have a sirq latency tracker in our prod kernel which catches
> > 8ms+ stalls of net Tx (packets queued to the NIC but there is
> > no NAPI cleanup within 8ms) and with these patches applied
> > on 5.19 fully loaded web machine sees a drop in stalls from
> > 1.8 stalls/sec to 0.16/sec. I also see a 50% drop in outgoing
> > TCP retransmissions and ~10% drop in non-TLP incoming ones.
> > This is not a network-heavy workload so most of the rtx are
> > due to scheduling artifacts.
> >
> > The network latency in a datacenter is somewhere around neat
> > 1000x lower than scheduling granularity (around 10us).
> >
> > These patches (patch 2 is "the meat") change what we recognize
> > as overload. Instead of just checking if "ksoftirqd is woken"
> > it also caps how long we consider ourselves to be in overload,
> > a time limit which is different based on whether we yield due
> > to real resource exhaustion vs just hitting that needs_resched().
> >
> > I hope the core concept is not entirely idiotic. It'd be great
> > if we could get this in or fold an equivalent concept into ongoing
> > work from others, because due to various "scheduler improvements"
> > every time we upgrade the production kernel this problem is getting
> > worse :(
>
> Please allow me to revive this old thread.
>
> My understanding is that we want to avoid adding more heuristics here,
> preferring a consistent refactor.
>
> I would like to propose a revert of:
>
> 4cd13c21b207 softirq: Let ksoftirqd do its job
>
> the its follow-ups:
>
> 3c53776e29f8 Mark HI and TASKLET softirq synchronous
> 0f50524789fc softirq: Don't skip softirq execution when softirq thread is parking
>
> The problem originally addressed by 4cd13c21b207 can now be tackled
> with the threaded napi, available since:
>
> 29863d41bb6e net: implement threaded-able napi poll loop support
>
> Reverting the mentioned commit should address the latency issues
> mentioned by Jakub - I verified it solves a somewhat related problem in
> my setup - and reduces the layering of heuristics in this area.
>
> A refactor introducing uniform overload detection and proper resource
> control will be better, but I admit it's beyond me and anyway it could
> still land afterwards.
>
> Any opinion more then welcome!

Seems fine, but I think few things need to be fixed first in
napi_threaded_poll()
to enable some important features that are currently  in net_rx_action() only.