lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <13d6579e9bc44dc2bfb73de8d9715b10@AcuMS.aculab.com>
Date:   Thu, 19 May 2022 08:44:53 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Michael Chan' <michael.chan@...adcom.com>
CC:     Paolo Abeni <pabeni@...hat.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "mchan@...adcom.com" <mchan@...adcom.com>,
        "David Miller" <davem@...emloft.net>,
        Pavan Chebbi <pavan.chebbi@...adcom.com>
Subject: RE: tg3 dropping packets at high packet rates

From: Michael Chan
> Sent: 19 May 2022 01:52
> 
> On Wed, May 18, 2022 at 2:31 PM David Laight <David.Laight@...lab.com> wrote:
> >
> > From: Paolo Abeni
> > > Sent: 18 May 2022 18:27
> > ....
> > > > If I read /sys/class/net/em2/statistics/rx_packets every second
> > > > delaying with:
> > > >   syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, TIMER_ABSTIME, &ts, NULL);
> > > > about every 43 seconds I get a zero increment.
> > > > This really doesn't help!
> > >
> > > It looks like the tg3 driver fetches the H/W stats once per second. I
> > > guess that if you fetch them with the same period and you are unlucky
> > > you can read the same sample 2 consecutive time.
> >
> > Actually I think the hardware is writing them to kernel memory
> > every second.
> 
> On your BCM95720 chip, statistics are gathered by tg3_timer() once a
> second.  Older chips will use DMA.

Ah, I wasn't sure which code was relevant.
FWIW the code could rotate 64bit values by 32 bits
to convert to/from the strange ordering the hardware uses.

> Please show a snapshot of all the counters.  In particular,
> rxbds_empty, rx_discards, etc will show whether the driver is keeping
> up with incoming RX packets or not.

After running the test for a short time.
The application stats indicate that around 40000 packets are missing.

# ethtool -S em2 | grep -v ' 0$'; for f in /sys/class/net/em2/statistics/*; do echo $f $(cat $f); done|grep -v ' 0$'
NIC statistics:
     rx_octets: 4589028558
     rx_ucast_packets: 21049866
     rx_mcast_packets: 763
     rx_bcast_packets: 746
     tx_octets: 4344
     tx_ucast_packets: 6
     tx_mcast_packets: 40
     tx_bcast_packets: 3
     rxbds_empty: 76
     rx_discards: 14
     mbuf_lwm_thresh_hit: 14
/sys/class/net/em2/statistics/multicast 763
/sys/class/net/em2/statistics/rx_bytes 4589028558
/sys/class/net/em2/statistics/rx_missed_errors 14
/sys/class/net/em2/statistics/rx_packets 21433169
/sys/class/net/em2/statistics/tx_bytes 4344
/sys/class/net/em2/statistics/tx_packets 49

I've replaced the rx_packets count with an atomic64 counter in tg3_rx().
Reading every second gives values like:

# echo_every 1 |(c=0; n0=0; while read r; do n=$(cat /sys/class/net/em2/statistics/rx_packets); echo $c $((n - n0)); c=$((c+1)); n0=$n; done)
0 397169949
1 399831
2 399883
3 399913
4 399871
5 398747
6 400035
7 399958
8 399947
9 399923
10 399978
11 399457
12 399130
13 400128
14 399808
15 399029

They should all be 400000 with slight variances.
But there are clearly 100s of packets being discarded in some
1 second periods.

I don't think I can blame the network.
All the systems are plugged into the same ethernet switch on a test LAN.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ