lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 4 Feb 2022 10:18:02 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Alexander Duyck' <alexander.duyck@...il.com>,
        Eric Dumazet <edumazet@...gle.com>
CC:     Eric Dumazet <eric.dumazet@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>, Coco Li <lixiaoyan@...gle.com>
Subject: RE: [PATCH net-next 09/15] net: increase MAX_SKB_FRAGS

From: Alexander Duyck
> Sent: 03 February 2022 17:57
...
> > > So a big issue I see with this patch is the potential queueing issues
> > > it may introduce on Tx queues. I suspect it will cause a number of
> > > performance regressions and deadlocks as it will change the Tx queueing
> > > behavior for many NICs.
> > >
> > > As I recall many of the Intel drivers are using MAX_SKB_FRAGS as one of
> > > the ingredients for DESC_NEEDED in order to determine if the Tx queue
> > > needs to stop. With this change the value for igb for instance is
> > > jumping from 21 to 49, and the wake threshold is twice that, 98. As
> > > such the minimum Tx descriptor threshold for the driver would need to
> > > be updated beyond 80 otherwise it is likely to deadlock the first time
> > > it has to pause.
> >
> > Are these limits hard coded in Intel drivers and firmware, or do you
> > think this can be changed ?
> 
> This is all code in the drivers. Most drivers have them as the logic
> is used to avoid having to return NETIDEV_TX_BUSY. Basically the
> assumption is there is a 1:1 correlation between descriptors and
> individual frags. So most drivers would need to increase the size of
> their Tx descriptor rings if they were optimized for a lower value.

Maybe the drivers can be a little less conservative about the number
of fragments they expect in the next message?
There is little point requiring 49 free descriptors when the workload
never has more than 2 or 3 fragments.

Clearly you don't want to re-enable things unless there are enough
descriptors for an skb that has generated NETDEV_TX_BUSY, but the
current logic of 'trying to never actually return NETDEV_TX_BUSY'
is probably over cautious.

Does Linux allow skb to have a lot of short fragments?
If dma_map isn't cheap (probably anything with an iommu or non-coherent
memory) them copying/merging short fragments into a pre-mapped
buffer can easily be faster.
Many years ago we found it was worth copying anything under 1k on
a sparc mbus+sbus system.
I don't think Linux can generate what I've seen elsewhere - the mac
driver being asked to transmit something with 1000+ one byte fragmemts!

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ