lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 19 May 2022 15:29:22 +0200
From:   Paolo Abeni <pabeni@...hat.com>
To:     David Laight <David.Laight@...LAB.COM>,
        'Pavan Chebbi' <pavan.chebbi@...adcom.com>
Cc:     Michael Chan <michael.chan@...adcom.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "mchan@...adcom.com" <mchan@...adcom.com>,
        David Miller <davem@...emloft.net>
Subject: Re: tg3 dropping packets at high packet rates

On Thu, 2022-05-19 at 13:14 +0000, David Laight wrote:
> From: Pavan Chebbi
> > Sent: 19 May 2022 11:21
> ...
> > > 
> > > > Please show a snapshot of all the counters.  In particular,
> > > > rxbds_empty, rx_discards, etc will show whether the driver is keeping
> > > > up with incoming RX packets or not.
> > > 
> > > After running the test for a short time.
> > > The application stats indicate that around 40000 packets are missing.
> > > 
> ...
> 
> Some numbers taken at the same time:
> 
> Application trace - each 'gap' is one or more lost packets.
> T+000004:  all gaps so far 1104
> T+000005:  all gaps so far 21664
> T+000006:  all gaps so far 54644
> T+000007:  all gaps so far 84641
> T+000008:  all gaps so far 110232
> T+000009:  all gaps so far 131191
> T+000010:  all gaps so far 150286
> T+000011:  all gaps so far 171588
> T+000012:  all gaps so far 190777
> T+000013:  all gaps so far 210771
> 
> rx_packets counted by tg3_rx() and read every second.
> 63 344426
> 64 341734
> 65 338740
> 66 337995
> 67 339770
> 68 336314
> 69 340087
> 70 345084
> 
> Cumulative error counts since the driver was last loaded.
>      rxbds_empty: 30983
>      rx_discards: 3123
>      mbuf_lwm_thresh_hit: 3123
> 
> The number of interrupt is high - about 40000/sec.
> (I've not deltad these, just removed all the zeros and prefixed the
> cpu number before each non-zero value.)
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234754517
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234767945
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234802555
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234843542
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234887963
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234928204
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234966428
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235009505
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235052740
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235093254
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235133299
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235173151
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235212387
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235252403
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235317928
> 86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235371301
> 
> RSS is enabled, but I've used ethtool -X equal 1 to
> put everything through ring 0.
> Cpu 14 is still 25% idle - that is the busiest cpu.

If the packet processing is 'bursty', you can have idle time and still
hit now and the 'rx ring is [almost] full' condition. If pause frames
are enabled, that will cause the peer to stop sending frames: drop can
happen in the switch, and the local NIC will not notice (unless there
are counters avaialble for pause frames sent).

AFAICS the packet processing is bursty, because enqueuing packets to a
remote CPU in considerably faster then full network stack processing.

Side note: on a not-to-obsolete H/W the kernel should be able to
process >1mpps per cpu.

Paolo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ