lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220525085647.6dfb7ed0@kernel.org>
Date:   Wed, 25 May 2022 08:56:47 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     David Laight <David.Laight@...LAB.COM>
Cc:     'Pavan Chebbi' <pavan.chebbi@...adcom.com>,
        Paolo Abeni <pabeni@...hat.com>,
        Michael Chan <michael.chan@...adcom.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "mchan@...adcom.com" <mchan@...adcom.com>,
        David Miller <davem@...emloft.net>
Subject: Re: tg3 dropping packets at high packet rates

On Wed, 25 May 2022 07:28:42 +0000 David Laight wrote:
> > As the trace below shows I think the underlying problem
> > is that the napi callbacks aren't being made in a timely manner.  
> 
> Further investigations have shown that this is actually
> a generic problem with the way napi callbacks are called
> from the softint handler.
> 
> The underlying problem is the effect of this code
> in __do_softirq().
> 
>         pending = local_softirq_pending();
>         if (pending) {
>                 if (time_before(jiffies, end) && !need_resched() &&
>                     --max_restart)
>                         goto restart;
> 
>                 wakeup_softirqd();
>         }
> 
> The napi processing can loop through here and needs to do
> the 'goto restart' - not doing so will drop packets.
> The need_resched() test is particularly troublesome.
> I've also had to increase the limit for 'max_restart' from
> its (hard coded) 10 to 1000 (100 isn't enough).
> I'm not sure whether I'm hitting the jiffies limit,
> but that is hard coded at 2.
> 
> I'm going to start another thread.

If you share the core between the application and NAPI try using prefer
busy polling (SO_PREFER_BUSY_POLL), and manage polling from user space.
If you have separate cores use threaded NAPI and isolate the core
running NAPI or give it high prio.

YMMV but I've spent more time than I'd like to admit looking at the
softirq yielding conditions, they are hard to beat :( If you control
the app much better use of your time to arrange busy poll or pin things.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ