netdev - Re: tg3 dropping packets at high packet rates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5cc5353c518e27de69fc0d832294634c83f431e5.camel@redhat.com>
Date:   Thu, 19 May 2022 15:29:22 +0200
From:   Paolo Abeni <pabeni@...hat.com>
To:     David Laight <David.Laight@...LAB.COM>,
        'Pavan Chebbi' <pavan.chebbi@...adcom.com>
Cc:     Michael Chan <michael.chan@...adcom.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "mchan@...adcom.com" <mchan@...adcom.com>,
        David Miller <davem@...emloft.net>
Subject: Re: tg3 dropping packets at high packet rates

On Thu, 2022-05-19 at 13:14 +0000, David Laight wrote:
> From: Pavan Chebbi
> > Sent: 19 May 2022 11:21
> ...
> > > 
> > > > Please show a snapshot of all the counters.  In particular,
> > > > rxbds_empty, rx_discards, etc will show whether the driver is keeping
> > > > up with incoming RX packets or not.
> > > 
> > > After running the test for a short time.
> > > The application stats indicate that around 40000 packets are missing.
> > > 
> ...
> 
> Some numbers taken at the same time:
> 
> Application trace - each 'gap' is one or more lost packets.
> T+000004:  all gaps so far 1104
> T+000005:  all gaps so far 21664
> T+000006:  all gaps so far 54644
> T+000007:  all gaps so far 84641
> T+000008:  all gaps so far 110232
> T+000009:  all gaps so far 131191
> T+000010:  all gaps so far 150286
> T+000011:  all gaps so far 171588
> T+000012:  all gaps so far 190777
> T+000013:  all gaps so far 210771
> 
> rx_packets counted by tg3_rx() and read every second.
> 63 344426
> 64 341734
> 65 338740
> 66 337995
> 67 339770
> 68 336314
> 69 340087
> 70 345084
> 
> Cumulative error counts since the driver was last loaded.
>      rxbds_empty: 30983
>      rx_discards: 3123
>      mbuf_lwm_thresh_hit: 3123
> 
> The number of interrupt is high - about 40000/sec.
> (I've not deltad these, just removed all the zeros and prefixed the
> cpu number before each non-zero value.)
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234754517
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234767945
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234802555
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234843542
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234887963
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234928204
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234966428
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235009505
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235052740
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235093254
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235133299
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235173151
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235212387
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235252403
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235317928
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235371301
> 
> RSS is enabled, but I've used ethtool -X equal 1 to
> put everything through ring 0.
> Cpu 14 is still 25% idle - that is the busiest cpu.

If the packet processing is 'bursty', you can have idle time and still
hit now and the 'rx ring is [almost] full' condition. If pause frames
are enabled, that will cause the peer to stop sending frames: drop can
happen in the switch, and the local NIC will not notice (unless there
are counters avaialble for pause frames sent).

AFAICS the packet processing is bursty, because enqueuing packets to a
remote CPU in considerably faster then full network stack processing.

Side note: on a not-to-obsolete H/W the kernel should be able to
process >1mpps per cpu.

Paolo