netdev - RE: tg3 dropping packets at high packet rates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <153739175cf241a5895e6a5685a89598@AcuMS.aculab.com>
Date:   Wed, 18 May 2022 21:31:08 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Paolo Abeni' <pabeni@...hat.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC:     "'mchan@...adcom.com'" <mchan@...adcom.com>,
        David Miller <davem@...emloft.net>
Subject: RE: tg3 dropping packets at high packet rates

From: Paolo Abeni
> Sent: 18 May 2022 18:27
....
> > If I read /sys/class/net/em2/statistics/rx_packets every second
> > delaying with:
> >   syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, TIMER_ABSTIME, &ts, NULL);
> > about every 43 seconds I get a zero increment.
> > This really doesn't help!
> 
> It looks like the tg3 driver fetches the H/W stats once per second. I
> guess that if you fetch them with the same period and you are unlucky
> you can read the same sample 2 consecutive time.

Actually I think the hardware is writing them to kernel memory
every second.
This really isn't ideal for packet counts.

...
> With RPS enabled packet processing for most packets (the ones stirred
> to remote CPUs) is very cheap, as the skb are moved out of the NIC to a
> per CPU queue and that's it.

It may be 'cheap', but at 400000 frames/sec it adds up.
The processing in tg3 is relatively light - depending on
the actual hardware.
From what I remember of the e1000 driver that is worse.

> In theory packets could be drepped before inserting them into the RPS
> queue, if the latter grow to big, but that looks unlikely. You can try
> raising netdev_max_backlog, just in case.

I'm pretty sure nothing is being dropped that late on.
The cpu processing the RPS data are maxing at around 13% 'softint'
is 5.18 - quite a lot more than the 10% with 3.10.

> dropwatch (or perf record -ga -e skb:kfree_skb) should point you where
> exactly the packets are dropped.

I'm 99.9% sure the packets aren't getting into skb.
I can just about run very selective ftrace traces.
They don't seem to show anything being dropped.
By it is very difficult to see anything at these
packet rates.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)