lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 19 May 2022 15:50:58 +0530
From:   Pavan Chebbi <pavan.chebbi@...adcom.com>
To:     David Laight <David.Laight@...lab.com>
Cc:     Michael Chan <michael.chan@...adcom.com>,
        Paolo Abeni <pabeni@...hat.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "mchan@...adcom.com" <mchan@...adcom.com>,
        David Miller <davem@...emloft.net>
Subject: Re: tg3 dropping packets at high packet rates

On Thu, May 19, 2022 at 2:14 PM David Laight <David.Laight@...lab.com> wrote:
>
> From: Michael Chan
> > Sent: 19 May 2022 01:52
> >
> > On Wed, May 18, 2022 at 2:31 PM David Laight <David.Laight@...lab.com> wrote:
> > >
> > > From: Paolo Abeni
> > > > Sent: 18 May 2022 18:27
> > > ....
> > > > > If I read /sys/class/net/em2/statistics/rx_packets every second
> > > > > delaying with:
> > > > >   syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, TIMER_ABSTIME, &ts, NULL);
> > > > > about every 43 seconds I get a zero increment.
> > > > > This really doesn't help!
> > > >
> > > > It looks like the tg3 driver fetches the H/W stats once per second. I
> > > > guess that if you fetch them with the same period and you are unlucky
> > > > you can read the same sample 2 consecutive time.
> > >
> > > Actually I think the hardware is writing them to kernel memory
> > > every second.
> >
> > On your BCM95720 chip, statistics are gathered by tg3_timer() once a
> > second.  Older chips will use DMA.
>
> Ah, I wasn't sure which code was relevant.
> FWIW the code could rotate 64bit values by 32 bits
> to convert to/from the strange ordering the hardware uses.
>
> > Please show a snapshot of all the counters.  In particular,
> > rxbds_empty, rx_discards, etc will show whether the driver is keeping
> > up with incoming RX packets or not.
>
> After running the test for a short time.
> The application stats indicate that around 40000 packets are missing.
>
> # ethtool -S em2 | grep -v ' 0$'; for f in /sys/class/net/em2/statistics/*; do echo $f $(cat $f); done|grep -v ' 0$'
> NIC statistics:
>      rx_octets: 4589028558
>      rx_ucast_packets: 21049866
>      rx_mcast_packets: 763
>      rx_bcast_packets: 746
>      tx_octets: 4344
>      tx_ucast_packets: 6
>      tx_mcast_packets: 40
>      tx_bcast_packets: 3
>      rxbds_empty: 76
>      rx_discards: 14
>      mbuf_lwm_thresh_hit: 14
> /sys/class/net/em2/statistics/multicast 763
> /sys/class/net/em2/statistics/rx_bytes 4589028558
> /sys/class/net/em2/statistics/rx_missed_errors 14
> /sys/class/net/em2/statistics/rx_packets 21433169
> /sys/class/net/em2/statistics/tx_bytes 4344
> /sys/class/net/em2/statistics/tx_packets 49
>
> I've replaced the rx_packets count with an atomic64 counter in tg3_rx().
> Reading every second gives values like:
>
> # echo_every 1 |(c=0; n0=0; while read r; do n=$(cat /sys/class/net/em2/statistics/rx_packets); echo $c $((n - n0)); c=$((c+1)); n0=$n; done)
> 0 397169949
> 1 399831
> 2 399883
> 3 399913
> 4 399871
> 5 398747
> 6 400035
> 7 399958
> 8 399947
> 9 399923
> 10 399978
> 11 399457
> 12 399130
> 13 400128
> 14 399808
> 15 399029
>

I see that in a span of 15 seconds, the received packets are 4362 less
than what you are expecting (considering 400000/s avg)
In what time period did the application report 40000 missing packets?
Does it map to about 150 seconds of test time?
The error counters do not look suspicious at this point for the
reported problem.
Do you see this problem with any other traffic pattern?

> They should all be 400000 with slight variances.
> But there are clearly 100s of packets being discarded in some
> 1 second periods.
>
> I don't think I can blame the network.
> All the systems are plugged into the same ethernet switch on a test LAN.
>
>         David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

Download attachment "smime.p7s" of type "application/pkcs7-signature" (4209 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ