[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <11f331492498494584f171c4ab8dc733@AcuMS.aculab.com>
Date: Fri, 4 Feb 2022 10:18:02 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Alexander Duyck' <alexander.duyck@...il.com>,
Eric Dumazet <edumazet@...gle.com>
CC: Eric Dumazet <eric.dumazet@...il.com>,
"David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
netdev <netdev@...r.kernel.org>, Coco Li <lixiaoyan@...gle.com>
Subject: RE: [PATCH net-next 09/15] net: increase MAX_SKB_FRAGS
From: Alexander Duyck
> Sent: 03 February 2022 17:57
...
> > > So a big issue I see with this patch is the potential queueing issues
> > > it may introduce on Tx queues. I suspect it will cause a number of
> > > performance regressions and deadlocks as it will change the Tx queueing
> > > behavior for many NICs.
> > >
> > > As I recall many of the Intel drivers are using MAX_SKB_FRAGS as one of
> > > the ingredients for DESC_NEEDED in order to determine if the Tx queue
> > > needs to stop. With this change the value for igb for instance is
> > > jumping from 21 to 49, and the wake threshold is twice that, 98. As
> > > such the minimum Tx descriptor threshold for the driver would need to
> > > be updated beyond 80 otherwise it is likely to deadlock the first time
> > > it has to pause.
> >
> > Are these limits hard coded in Intel drivers and firmware, or do you
> > think this can be changed ?
>
> This is all code in the drivers. Most drivers have them as the logic
> is used to avoid having to return NETIDEV_TX_BUSY. Basically the
> assumption is there is a 1:1 correlation between descriptors and
> individual frags. So most drivers would need to increase the size of
> their Tx descriptor rings if they were optimized for a lower value.
Maybe the drivers can be a little less conservative about the number
of fragments they expect in the next message?
There is little point requiring 49 free descriptors when the workload
never has more than 2 or 3 fragments.
Clearly you don't want to re-enable things unless there are enough
descriptors for an skb that has generated NETDEV_TX_BUSY, but the
current logic of 'trying to never actually return NETDEV_TX_BUSY'
is probably over cautious.
Does Linux allow skb to have a lot of short fragments?
If dma_map isn't cheap (probably anything with an iommu or non-coherent
memory) them copying/merging short fragments into a pre-mapped
buffer can easily be faster.
Many years ago we found it was worth copying anything under 1k on
a sparc mbus+sbus system.
I don't think Linux can generate what I've seen elsewhere - the mac
driver being asked to transmit something with 1000+ one byte fragmemts!
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists