netdev - Re: Softirq latencies causing lost ethernet packets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <42e84c99d8d254646fdfb66b001429fedd4c5830.camel@redhat.com>
Date:   Wed, 25 May 2022 13:01:43 +0200
From:   Paolo Abeni <pabeni@...hat.com>
To:     David Laight <David.Laight@...LAB.COM>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        "'greearb@...delatech.com'" <greearb@...delatech.com>,
        "'tglx@...utronix.de'" <tglx@...utronix.de>
Cc:     "'tj@...nel.org'" <tj@...nel.org>,
        "'priikone@....fi'" <priikone@....fi>,
        "'peterz@...radead.org'" <peterz@...radead.org>
Subject: Re: Softirq latencies causing lost ethernet packets

On Wed, 2022-05-25 at 09:01 +0000, David Laight wrote:
> I've finally discovered why I'm getting a lot of lost ethernet
> packets in one of my high packet rate tests (400k/sec short UDP).
> 
> The underlying problem is that the napi callbacks need to loop
> in the softirq code.
> For my test I need the cpu to be running at well over 50% 'softint'.
> (And that is just for the ethernet receive, RPS is moving the IP/UDP
> processing elsewhere.)
> 
> The problems are caused by this bit of code in __do_softirq():
> 
>         pending = local_softirq_pending();
>         if (pending) {
>                 if (time_before(jiffies, end) && !need_resched() &&
>                     --max_restart)
>                         goto restart;
> 
>                 wakeup_softirqd();
>         }
> 
> Eric's c10d73671 changed it from:
>         if (pending) {
>                 if (--max_restart)
>                         goto restart;
> 
>                 wakeup_softirqd();
>         }
> 
> to
>         if (pending) {
>                 if (time_before(jiffies, end) && !need_resched())
>                         goto restart;
> 
>                 wakeup_softirqd();
>         }
> 
> Because just running 10 copies caused excessive latencies.
> 
> The good work was then undone by 34376a50f that added the
> 'max_restart' check back (with its limit of 10) to avoid
> an issue with stop_machine getting stuck (jiffies doesn't
> increment).
> 
> This can (probably) be fixed by setting the limit to 1000.
> 
> However there is a separate issue with the need_resched() check.
> In my tests this is stopping the softint/napi callbacks for
> anything up to 9 milliseconds - more than enough to drop packets.
> 
> The problem here is that the softirqd are low priority processes.
> The application processes the receive the UDP all run under the
> realtime scheduler (priority -51).
> If the softint interrupts my RT process it is fine.
> But the following sequence isn't:
>  - softint runs on idle process.
>  - RT process scheduled on the same cpu
>  - __do_softirq() detects need_resched() calls wakeup_softirqd()
>  - scheduler switches from the idle to my RT process.
>  - RT process runs for several milliseconds.
>  - finally softirqd is scheduled
> 
> The softint is usually higher priority than any RT thread
> (because it just steals the context).
> But in the more unusual case of an RT process being scheduled
> while the softint is active it suddenly becomes lower priority
> than the RT process.
> 
> I'm sure what the intended purpose of the need_resched() is?
> I think it was eric's first thought for a limit, but he had to
> add the jiffies test as well to avoid RCU stalls.
> 
> The jiffies test itself might be problematic.
> It is fixed at 2 jiffies - 1ms to 2ms at 1000Hz.
> I'm expecting the softint code to be running at (maybe) 80% cpu.
> So that limit would need increasing.
> There is a similar limit in the napi code - but that is configurable
> (and, I think, just causes the softing code to loop).
> 
> But if RCU stalls are a problem maybe the rcu read lock ought to
> disable softints?
> So the softint is run when the rcu lock is released.
> 
> I did try setting the softirqd processes to a much higher priority
> but that didn't seem to help - I didn't look exactly why.
> 
> While I could use processor affinities to stop the application's
> RT threads running on the softint-heavy cpu that is all hard
> and difficult to arrange.
> In any case the application can make use of the non-softint time
> on those cpu.

Overall this looks like a scenario where the napi threaded model could
help?

echo 1 > /sys/class/net/<dev name>/threaded

and than set the napi threads scheduling parameter as it fit you
better.

Cheers,

Paolo