netdev - Re: [PATCH net] net_sched: limit try_bulk_dequeue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM0EoMmPV8U3oNyf3D2F_RGzJgZQiMRBPq1ytokSLo6PcwFJpA@mail.gmail.com>
Date: Thu, 13 Nov 2025 13:30:30 -0500
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: kuba@...nel.org, davem@...emloft.net, pabeni@...hat.com, horms@...nel.org, 
	xiyou.wangcong@...il.com, jiri@...nulli.us, kuniyu@...gle.com, 
	willemb@...gle.com, netdev@...r.kernel.org, eric.dumazet@...il.com, 
	hawk@...nel.org, patchwork-bot+netdevbpf@...nel.org, toke@...hat.com
Subject: Re: [PATCH net] net_sched: limit try_bulk_dequeue_skb() batches

On Thu, Nov 13, 2025 at 1:08 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Thu, Nov 13, 2025 at 9:53 AM Jamal Hadi Salim <jhs@...atatu.com> wrote:
> >
> > [..]
> > Eric,
> >
> > So you are correct that requeues exist even before your changes to
> > speed up the tx path - two machines one with 6.5 and another with 6.8
> > variant exhibit this phenoma with very low traffic... which got me a
> > little curious.
> > My initial thought was perhaps it was related to mq/fqcodel combo but
> > a short run shows requeues occur on a couple of other qdiscs (ex prio)
> > and mq children (e.g., pfifo), which rules out fq codel as a
> > contributor to the requeues.
> > Example, this NUC i am typing on right now, after changing the root qdisc:
> >
> > --
> > $ uname -r
> > 6.8.0-87-generic
> > $
> > qdisc prio 8004: dev eno1 root refcnt 5 bands 8 priomap 1 2 2 2 1 2 0
> > 0 1 1 1 1 1 1 1 1
> >  Sent 360948039 bytes 1015807 pkt (dropped 0, overlimits 0 requeues 1528)
> >  backlog 0b 0p requeues 1528
> > ---
> >
> > and 20-30  seconds later:
> > ---
> > qdisc prio 8004: dev eno1 root refcnt 5 bands 8 priomap 1 2 2 2 1 2 0
> > 0 1 1 1 1 1 1 1 1
> >  Sent 361867275 bytes 1017386 pkt (dropped 0, overlimits 0 requeues 1531)
> >  backlog 0b 0p requeues 1531
> > ----
> >
> > Reel cheep NIC doing 1G with 4 tx rings:
> > ---
> > $ ethtool -i eno1
> > driver: igc
> > version: 6.8.0-87-generic
> > firmware-version: 1085:8770
> > expansion-rom-version:
> > bus-info: 0000:02:00.0
> > supports-statistics: yes
> > supports-test: yes
> > supports-eeprom-access: yes
> > supports-register-dump: yes
> > supports-priv-flags: yes
> >
> > $ ethtool eno1
> > Settings for eno1:
> > Supported ports: [ TP ]
> > Supported link modes:   10baseT/Half 10baseT/Full
> >                         100baseT/Half 100baseT/Full
> >                         1000baseT/Full
> >                         2500baseT/Full
> > Supported pause frame use: Symmetric
> > Supports auto-negotiation: Yes
> > Supported FEC modes: Not reported
> > Advertised link modes:  10baseT/Half 10baseT/Full
> >                         100baseT/Half 100baseT/Full
> >                         1000baseT/Full
> >                         2500baseT/Full
> > Advertised pause frame use: Symmetric
> > Advertised auto-negotiation: Yes
> > Advertised FEC modes: Not reported
> > Speed: 1000Mb/s
> > Duplex: Full
> > Auto-negotiation: on
> > Port: Twisted Pair
> > PHYAD: 0
> > Transceiver: internal
> > MDI-X: off (auto)
> > netlink error: Operation not permitted
> >         Current message level: 0x00000007 (7)
> >                                drv probe link
> > Link detected: yes
> > ----
> >
> > Requeues should only happen if the driver is overwhelmed on the tx
> > side - i.e tx ring of choice has no more space. Back in the day, this
> > was not a very common event.
> > That can certainly be justified today with several explanations if: a)
> > modern processors getting faster b) the tx code path has become more
> > efficient (true from inspection and your results but those patches are
> > not on my small systems) c) (unlikely but) we are misaccounting for
> > requeues (need to look at the code). d) the driver is too eager to
> > return TX BUSY.
> >
> > Thoughts?
>
> requeues can happen because some drivers do not use skb->len for the
> BQL budget, but something bigger for GSO packets,
> because they want to account for the (N) headers.
>
> So the core networking stack could pull too many packets from the
> qdisc for one xmit_more batch,
> then ndo_start_xmit() at some point stops the queue before the end of
> the batch, because BQL limit is hit sooner.
>
> I think drivers should not be overzealous, BQL is a best effort, we do
> not care of extra headers.
>
> drivers/net/ethernet/intel/igc/igc_main.c is one of the overzealous drivers ;)
>
> igc_tso() ...
>
> /* update gso size and bytecount with header size */
> first->gso_segs = skb_shinfo(skb)->gso_segs;
> first->bytecount += (first->gso_segs - 1) * *hdr_len;
>


Ok, the 25G i40e driver we are going to run tests on seems to be
suffering from the same enthusiasm ;->
I guess the same codebase..
Very few drivers tho seem to be doing what you suggest. Of course idpf
being one of those ;->

cheers,
jamal

> > We will run some forwarding performance tests and let you know if we
> > spot anything..
> >
> > cheers,
> > jamal