netdev - Re: [PATCH 3/3] softirq: don't yield if only expedited handlers are pending

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iK2NTz_M-OtcN5iATUacMaseNLi42QipuxDF3MMQCEVHg@mail.gmail.com>
Date:   Mon, 9 Jan 2023 11:16:45 +0100
From:   Eric Dumazet <edumazet@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Jakub Kicinski <kuba@...nel.org>, tglx@...utronix.de,
        jstultz@...gle.com, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] softirq: don't yield if only expedited handlers are pending

On Mon, Jan 9, 2023 at 10:44 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Thu, Dec 22, 2022 at 02:12:44PM -0800, Jakub Kicinski wrote:
> > In networking we try to keep Tx packet queues small, so we limit
> > how many bytes a socket may packetize and queue up. Tx completions
> > (from NAPI) notify the sockets when packets have left the system
> > (NIC Tx completion) and the socket schedules a tasklet to queue
> > the next batch of frames.
> >
> > This leads to a situation where we go thru the softirq loop twice.
> > First round we have pending = NET (from the NIC IRQ/NAPI), and
> > the second iteration has pending = TASKLET (the socket tasklet).
>
> So to me that sounds like you want to fix the network code to not do
> this then. Why can't the NAPI thing directly queue the next batch; why
> do you have to do a softirq roundtrip like this?

I think Jakub refers to tcp_wfree() code, which can be called from
arbitrary contexts,
including non NAPI ones, and with the socket locked (by this thread or
another) or not locked at all
(say if skb is freed from a TX completion handler or a qdisc drop)

>
> > On two web workloads I looked at this condition accounts for 10%
> > and 23% of all ksoftirqd wake ups respectively. We run NAPI
> > which wakes some process up, we hit need_resched() and wake up
> > ksoftirqd just to run the TSQ (TCP small queues) tasklet.
> >
> > Tweak the need_resched() condition to be ignored if all pending
> > softIRQs are "non-deferred". The tasklet would run relatively
> > soon, anyway, but once ksoftirqd is woken we're risking stalls.
> >
> > I did not see any negative impact on the latency in an RR test
> > on a loaded machine with this change applied.
>
> Ignoring need_resched() will get you in trouble with RT people real
> fast.