netdev - Re: [PATCH net] net_sched: limit try_bulk_dequeue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iJ95S3ia=G7uJb-jGnnaJiQcMVHGEpnKMWc=QZh5tUS=w@mail.gmail.com>
Date: Thu, 13 Nov 2025 10:08:29 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: kuba@...nel.org, davem@...emloft.net, pabeni@...hat.com, horms@...nel.org, 
	xiyou.wangcong@...il.com, jiri@...nulli.us, kuniyu@...gle.com, 
	willemb@...gle.com, netdev@...r.kernel.org, eric.dumazet@...il.com, 
	hawk@...nel.org, patchwork-bot+netdevbpf@...nel.org, toke@...hat.com
Subject: Re: [PATCH net] net_sched: limit try_bulk_dequeue_skb() batches

On Thu, Nov 13, 2025 at 9:53 AM Jamal Hadi Salim <jhs@...atatu.com> wrote:
>
> [..]
> Eric,
>
> So you are correct that requeues exist even before your changes to
> speed up the tx path - two machines one with 6.5 and another with 6.8
> variant exhibit this phenoma with very low traffic... which got me a
> little curious.
> My initial thought was perhaps it was related to mq/fqcodel combo but
> a short run shows requeues occur on a couple of other qdiscs (ex prio)
> and mq children (e.g., pfifo), which rules out fq codel as a
> contributor to the requeues.
> Example, this NUC i am typing on right now, after changing the root qdisc:
>
> --
> $ uname -r
> 6.8.0-87-generic
> $
> qdisc prio 8004: dev eno1 root refcnt 5 bands 8 priomap 1 2 2 2 1 2 0
> 0 1 1 1 1 1 1 1 1
>  Sent 360948039 bytes 1015807 pkt (dropped 0, overlimits 0 requeues 1528)
>  backlog 0b 0p requeues 1528
> ---
>
> and 20-30  seconds later:
> ---
> qdisc prio 8004: dev eno1 root refcnt 5 bands 8 priomap 1 2 2 2 1 2 0
> 0 1 1 1 1 1 1 1 1
>  Sent 361867275 bytes 1017386 pkt (dropped 0, overlimits 0 requeues 1531)
>  backlog 0b 0p requeues 1531
> ----
>
> Reel cheep NIC doing 1G with 4 tx rings:
> ---
> $ ethtool -i eno1
> driver: igc
> version: 6.8.0-87-generic
> firmware-version: 1085:8770
> expansion-rom-version:
> bus-info: 0000:02:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
>
> $ ethtool eno1
> Settings for eno1:
> Supported ports: [ TP ]
> Supported link modes:   10baseT/Half 10baseT/Full
>                         100baseT/Half 100baseT/Full
>                         1000baseT/Full
>                         2500baseT/Full
> Supported pause frame use: Symmetric
> Supports auto-negotiation: Yes
> Supported FEC modes: Not reported
> Advertised link modes:  10baseT/Half 10baseT/Full
>                         100baseT/Half 100baseT/Full
>                         1000baseT/Full
>                         2500baseT/Full
> Advertised pause frame use: Symmetric
> Advertised auto-negotiation: Yes
> Advertised FEC modes: Not reported
> Speed: 1000Mb/s
> Duplex: Full
> Auto-negotiation: on
> Port: Twisted Pair
> PHYAD: 0
> Transceiver: internal
> MDI-X: off (auto)
> netlink error: Operation not permitted
>         Current message level: 0x00000007 (7)
>                                drv probe link
> Link detected: yes
> ----
>
> Requeues should only happen if the driver is overwhelmed on the tx
> side - i.e tx ring of choice has no more space. Back in the day, this
> was not a very common event.
> That can certainly be justified today with several explanations if: a)
> modern processors getting faster b) the tx code path has become more
> efficient (true from inspection and your results but those patches are
> not on my small systems) c) (unlikely but) we are misaccounting for
> requeues (need to look at the code). d) the driver is too eager to
> return TX BUSY.
>
> Thoughts?

requeues can happen because some drivers do not use skb->len for the
BQL budget, but something bigger for GSO packets,
because they want to account for the (N) headers.

So the core networking stack could pull too many packets from the
qdisc for one xmit_more batch,
then ndo_start_xmit() at some point stops the queue before the end of
the batch, because BQL limit is hit sooner.

I think drivers should not be overzealous, BQL is a best effort, we do
not care of extra headers.

drivers/net/ethernet/intel/igc/igc_main.c is one of the overzealous drivers ;)

igc_tso() ...

/* update gso size and bytecount with header size */
first->gso_segs = skb_shinfo(skb)->gso_segs;
first->bytecount += (first->gso_segs - 1) * *hdr_len;

>
> We will run some forwarding performance tests and let you know if we
> spot anything..
>
> cheers,
> jamal