[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f3d1d5bf11144b31b1b3959e95b04490@AcuMS.aculab.com>
Date: Thu, 19 May 2022 13:14:53 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Pavan Chebbi' <pavan.chebbi@...adcom.com>
CC: Michael Chan <michael.chan@...adcom.com>,
Paolo Abeni <pabeni@...hat.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"mchan@...adcom.com" <mchan@...adcom.com>,
David Miller <davem@...emloft.net>
Subject: RE: tg3 dropping packets at high packet rates
From: Pavan Chebbi
> Sent: 19 May 2022 11:21
...
> >
> > > Please show a snapshot of all the counters. In particular,
> > > rxbds_empty, rx_discards, etc will show whether the driver is keeping
> > > up with incoming RX packets or not.
> >
> > After running the test for a short time.
> > The application stats indicate that around 40000 packets are missing.
> >
...
Some numbers taken at the same time:
Application trace - each 'gap' is one or more lost packets.
T+000004: all gaps so far 1104
T+000005: all gaps so far 21664
T+000006: all gaps so far 54644
T+000007: all gaps so far 84641
T+000008: all gaps so far 110232
T+000009: all gaps so far 131191
T+000010: all gaps so far 150286
T+000011: all gaps so far 171588
T+000012: all gaps so far 190777
T+000013: all gaps so far 210771
rx_packets counted by tg3_rx() and read every second.
63 344426
64 341734
65 338740
66 337995
67 339770
68 336314
69 340087
70 345084
Cumulative error counts since the driver was last loaded.
rxbds_empty: 30983
rx_discards: 3123
mbuf_lwm_thresh_hit: 3123
The number of interrupt is high - about 40000/sec.
(I've not deltad these, just removed all the zeros and prefixed the
cpu number before each non-zero value.)
86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234754517
86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234767945
86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234802555
86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234843542
86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234887963
86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234928204
86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:234966428
86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235009505
86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235052740
86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235093254
86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235133299
86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235173151
86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235212387
86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235252403
86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235317928
86: IR-PCI-MSI 1050625-edge em2-rx-1 8:13 14:235371301
RSS is enabled, but I've used ethtool -X equal 1 to
put everything through ring 0.
Cpu 14 is still 25% idle - that is the busiest cpu.
I've discovered that the 'lost packet' rate does depend on
the number of rx buffers configured with 'ethtool -G em2 rx nnnn'.
The traces above are with 1000 rx buffers.
I'm also slightly confused about the receive buffers.
As I read the code the following happens:
Ignoring jumbo buffers - which I don't have configured.
AFAICT all the rings have 2048 entries.
With RSS there are 4 pairs of rings, one contains (free) buffers
the other receive data status.
The receive code processes an entry from the status ring
and puts a buffer back onto the corresponding buffers ring.
Since the hardware only takes buffers from one ring, the
driver moves all the free buffers from rings 1-3 onto ring 0.
When the rings are allocated I think that buffers (default 200)
are added to all 4 rings.
As soon as the 'napi' code for ring 0 runs it collects the
other 600 buffers and puts them on its own (free) buffer ring.
This seems to make all 800 buffers available for any of the RSS
channels.
Now if I configure 'ethtool -G em2 rx 2000' a total of 8000
receive buffers are allocated.
Only 2047 will fit into ring[0] so the other 'buffer' rings
still contain buffers.
Now if I receive traffic that goes to ring[3] the free buffer
ring[3] will wrap - discarding 2048 buffers.
I'm assuming I've missed something?
This bit of code in tg3_rx() also looks buggy:
if (unlikely(rx_std_posted >= tp->rx_std_max_post)) {
tpr->rx_std_prod_idx = std_prod_idx &
tp->rx_std_ring_mask;
tw32_rx_mbox(TG3_RX_STD_PROD_IDX_REG,
tpr->rx_std_prod_idx);
work_mask &= ~RXD_OPAQUE_RING_STD;
rx_std_posted = 0;
}
Clearing work_mask stops napi[0] being run to move
the freed buffers across.
(I don't think I have the hardware that goes through that bit.)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists