netdev - Re: Softirq latencies causing lost ethernet packets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CANn89iKZhkL15b+pBft+XsUK+brxnQ6bX146Nz+YJ3FW-J1hyg@mail.gmail.com>
Date:   Wed, 25 May 2022 05:00:15 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Paolo Abeni <pabeni@...hat.com>
Cc:     David Laight <David.Laight@...lab.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "greearb@...delatech.com" <greearb@...delatech.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "tj@...nel.org" <tj@...nel.org>,
        "priikone@....fi" <priikone@....fi>,
        "peterz@...radead.org" <peterz@...radead.org>
Subject: Re: Softirq latencies causing lost ethernet packets

On Wed, May 25, 2022 at 4:01 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On Wed, 2022-05-25 at 09:01 +0000, David Laight wrote:
> > I've finally discovered why I'm getting a lot of lost ethernet
> > packets in one of my high packet rate tests (400k/sec short UDP).
> >
> > The underlying problem is that the napi callbacks need to loop
> > in the softirq code.
> > For my test I need the cpu to be running at well over 50% 'softint'.
> > (And that is just for the ethernet receive, RPS is moving the IP/UDP
> > processing elsewhere.)
> >
> > The problems are caused by this bit of code in __do_softirq():
> >
> >         pending = local_softirq_pending();
> >         if (pending) {
> >                 if (time_before(jiffies, end) && !need_resched() &&
> >                     --max_restart)
> >                         goto restart;
> >
> >                 wakeup_softirqd();
> >         }
> >
> > Eric's c10d73671 changed it from:
> >         if (pending) {
> >                 if (--max_restart)
> >                         goto restart;
> >
> >                 wakeup_softirqd();
> >         }
> >
> > to
> >         if (pending) {
> >                 if (time_before(jiffies, end) && !need_resched())
> >                         goto restart;
> >
> >                 wakeup_softirqd();
> >         }
> >
> > Because just running 10 copies caused excessive latencies.
> >
> > The good work was then undone by 34376a50f that added the
> > 'max_restart' check back (with its limit of 10) to avoid
> > an issue with stop_machine getting stuck (jiffies doesn't
> > increment).
> >
> > This can (probably) be fixed by setting the limit to 1000.
> >
> > However there is a separate issue with the need_resched() check.
> > In my tests this is stopping the softint/napi callbacks for
> > anything up to 9 milliseconds - more than enough to drop packets.
> >
> > The problem here is that the softirqd are low priority processes.
> > The application processes the receive the UDP all run under the
> > realtime scheduler (priority -51).
> > If the softint interrupts my RT process it is fine.
> > But the following sequence isn't:
> >  - softint runs on idle process.
> >  - RT process scheduled on the same cpu
> >  - __do_softirq() detects need_resched() calls wakeup_softirqd()
> >  - scheduler switches from the idle to my RT process.
> >  - RT process runs for several milliseconds.
> >  - finally softirqd is scheduled
> >
> > The softint is usually higher priority than any RT thread
> > (because it just steals the context).
> > But in the more unusual case of an RT process being scheduled
> > while the softint is active it suddenly becomes lower priority
> > than the RT process.
> >
> > I'm sure what the intended purpose of the need_resched() is?
> > I think it was eric's first thought for a limit, but he had to
> > add the jiffies test as well to avoid RCU stalls.
> >
> > The jiffies test itself might be problematic.
> > It is fixed at 2 jiffies - 1ms to 2ms at 1000Hz.
> > I'm expecting the softint code to be running at (maybe) 80% cpu.
> > So that limit would need increasing.
> > There is a similar limit in the napi code - but that is configurable
> > (and, I think, just causes the softing code to loop).
> >
> > But if RCU stalls are a problem maybe the rcu read lock ought to
> > disable softints?
> > So the softint is run when the rcu lock is released.
> >
> > I did try setting the softirqd processes to a much higher priority
> > but that didn't seem to help - I didn't look exactly why.
> >
> > While I could use processor affinities to stop the application's
> > RT threads running on the softint-heavy cpu that is all hard
> > and difficult to arrange.
> > In any case the application can make use of the non-softint time
> > on those cpu.
>
> Overall this looks like a scenario where the napi threaded model could
> help?
>
> echo 1 > /sys/class/net/<dev name>/threaded
>
> and than set the napi threads scheduling parameter as it fit you
> better.
>
> Cheers,
>
> Paolo

Also, make sure your user threads are not allowed to run on the cpu
servicing NIC interrupts.