lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoMkw6mKtk-=bRQtjWsTphJHNJ0j4Dk1beYS181c5SHZv4Q@mail.gmail.com>
Date: Thu, 13 Nov 2025 13:55:31 -0500
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: kuba@...nel.org, davem@...emloft.net, pabeni@...hat.com, horms@...nel.org, 
	xiyou.wangcong@...il.com, jiri@...nulli.us, kuniyu@...gle.com, 
	willemb@...gle.com, netdev@...r.kernel.org, eric.dumazet@...il.com, 
	hawk@...nel.org, patchwork-bot+netdevbpf@...nel.org, toke@...hat.com
Subject: Re: [PATCH net] net_sched: limit try_bulk_dequeue_skb() batches

On Thu, Nov 13, 2025 at 1:36 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Thu, Nov 13, 2025 at 10:30 AM Jamal Hadi Salim <jhs@...atatu.com> wrote:
> >
> > On Thu, Nov 13, 2025 at 1:08 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > >
> > > On Thu, Nov 13, 2025 at 9:53 AM Jamal Hadi Salim <jhs@...atatu.com> wrote:
> > > >
> > > > [..]
> > > > Eric,
> > > >
> > > > So you are correct that requeues exist even before your changes to
> > > > speed up the tx path - two machines one with 6.5 and another with 6.8
> > > > variant exhibit this phenoma with very low traffic... which got me a
> > > > little curious.
> > > > My initial thought was perhaps it was related to mq/fqcodel combo but
> > > > a short run shows requeues occur on a couple of other qdiscs (ex prio)
> > > > and mq children (e.g., pfifo), which rules out fq codel as a
> > > > contributor to the requeues.
> > > > Example, this NUC i am typing on right now, after changing the root qdisc:
> > > >
> > > > --
> > > > $ uname -r
> > > > 6.8.0-87-generic
> > > > $
> > > > qdisc prio 8004: dev eno1 root refcnt 5 bands 8 priomap 1 2 2 2 1 2 0
> > > > 0 1 1 1 1 1 1 1 1
> > > >  Sent 360948039 bytes 1015807 pkt (dropped 0, overlimits 0 requeues 1528)
> > > >  backlog 0b 0p requeues 1528
> > > > ---
> > > >
> > > > and 20-30  seconds later:
> > > > ---
> > > > qdisc prio 8004: dev eno1 root refcnt 5 bands 8 priomap 1 2 2 2 1 2 0
> > > > 0 1 1 1 1 1 1 1 1
> > > >  Sent 361867275 bytes 1017386 pkt (dropped 0, overlimits 0 requeues 1531)
> > > >  backlog 0b 0p requeues 1531
> > > > ----
> > > >
> > > > Reel cheep NIC doing 1G with 4 tx rings:
> > > > ---
> > > > $ ethtool -i eno1
> > > > driver: igc
> > > > version: 6.8.0-87-generic
> > > > firmware-version: 1085:8770
> > > > expansion-rom-version:
> > > > bus-info: 0000:02:00.0
> > > > supports-statistics: yes
> > > > supports-test: yes
> > > > supports-eeprom-access: yes
> > > > supports-register-dump: yes
> > > > supports-priv-flags: yes
> > > >
> > > > $ ethtool eno1
> > > > Settings for eno1:
> > > > Supported ports: [ TP ]
> > > > Supported link modes:   10baseT/Half 10baseT/Full
> > > >                         100baseT/Half 100baseT/Full
> > > >                         1000baseT/Full
> > > >                         2500baseT/Full
> > > > Supported pause frame use: Symmetric
> > > > Supports auto-negotiation: Yes
> > > > Supported FEC modes: Not reported
> > > > Advertised link modes:  10baseT/Half 10baseT/Full
> > > >                         100baseT/Half 100baseT/Full
> > > >                         1000baseT/Full
> > > >                         2500baseT/Full
> > > > Advertised pause frame use: Symmetric
> > > > Advertised auto-negotiation: Yes
> > > > Advertised FEC modes: Not reported
> > > > Speed: 1000Mb/s
> > > > Duplex: Full
> > > > Auto-negotiation: on
> > > > Port: Twisted Pair
> > > > PHYAD: 0
> > > > Transceiver: internal
> > > > MDI-X: off (auto)
> > > > netlink error: Operation not permitted
> > > >         Current message level: 0x00000007 (7)
> > > >                                drv probe link
> > > > Link detected: yes
> > > > ----
> > > >
> > > > Requeues should only happen if the driver is overwhelmed on the tx
> > > > side - i.e tx ring of choice has no more space. Back in the day, this
> > > > was not a very common event.
> > > > That can certainly be justified today with several explanations if: a)
> > > > modern processors getting faster b) the tx code path has become more
> > > > efficient (true from inspection and your results but those patches are
> > > > not on my small systems) c) (unlikely but) we are misaccounting for
> > > > requeues (need to look at the code). d) the driver is too eager to
> > > > return TX BUSY.
> > > >
> > > > Thoughts?
> > >
> > > requeues can happen because some drivers do not use skb->len for the
> > > BQL budget, but something bigger for GSO packets,
> > > because they want to account for the (N) headers.
> > >
> > > So the core networking stack could pull too many packets from the
> > > qdisc for one xmit_more batch,
> > > then ndo_start_xmit() at some point stops the queue before the end of
> > > the batch, because BQL limit is hit sooner.
> > >
> > > I think drivers should not be overzealous, BQL is a best effort, we do
> > > not care of extra headers.
> > >
> > > drivers/net/ethernet/intel/igc/igc_main.c is one of the overzealous drivers ;)
> > >
> > > igc_tso() ...
> > >
> > > /* update gso size and bytecount with header size */
> > > first->gso_segs = skb_shinfo(skb)->gso_segs;
> > > first->bytecount += (first->gso_segs - 1) * *hdr_len;
> > >
> >
> >
> > Ok, the 25G i40e driver we are going to run tests on seems to be
> > suffering from the same enthusiasm ;->
> > I guess the same codebase..
> > Very few drivers tho seem to be doing what you suggest. Of course idpf
> > being one of those ;->
>
> Note that few requeues are ok.
>
> In my case, I had 5 millions requeues per second, and at that point
> you start noticing something is wrong ;)

That's high ;-> For the nuc with igc, its <1%. Regardless, the
eagerness for TX BUSY implies reduced performance due to the early
bailout..

cheers,
jamal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ