netdev - Re: [PATCH v1 net-next 5/5] net: dev_queue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iKDx52BnKZhw=hpCCG1dHtXOGx8pbynDoFRE0h_+a7JhQ@mail.gmail.com>
Date: Sun, 9 Nov 2025 11:28:33 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Jonas Köppeler <j.koeppeler@...berlin.de>
Cc: Toke Høiland-Jørgensen <toke@...hat.com>, 
	"David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, 
	Simon Horman <horms@...nel.org>, Jamal Hadi Salim <jhs@...atatu.com>, 
	Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>, 
	Kuniyuki Iwashima <kuniyu@...gle.com>, Willem de Bruijn <willemb@...gle.com>, netdev@...r.kernel.org, 
	eric.dumazet@...il.com
Subject: Re: [PATCH v1 net-next 5/5] net: dev_queue_xmit() llist adoption

On Sun, Nov 9, 2025 at 11:18 AM Jonas Köppeler <j.koeppeler@...berlin.de> wrote:
>
> On 11/9/25 5:33 PM, Toke Høiland-Jørgensen wrote:
> > Not sure why there's this difference between your setup or mine; some
> > .config or hardware difference related to the use of atomics? Any other
> > ideas?
>
> Hi Eric, hi Toke,
>
> I observed a similar behavior where CAKE's throughput collapses after the patch.
>
> Test setup:
> - 4 queues CAKE root qdisc

Please send

tc -s -d qd sh


> - 64-byte packets at ~21 Mpps
> - Intel Xeon Gold 6209U + 25GbE Intel XXV710 NIC
> - DuT forwards incoming traffic back to traffic generator through cake
>
> Throughput over 10 seconds before/after patch:
>
> Before patch:
> 0.475   mpps
> 0.481   mpps
> 0.477   mpps
> 0.478   mpps
> 0.478   mpps
> 0.477   mpps
> 0.479   mpps
> 0.481   mpps
> 0.481   mpps
>
> After patch:
> 0.265  mpps
> 0.035  mpps
> 0.003  mpps
> 0.002  mpps
> 0.001  mpps
> 0.002  mpps
> 0.002  mpps
> 0.002  mpps
> 0.002  mpps
>
> ---
>
>
>  From the qdisc I also see a large number of drops. Running:
>
>      perf record -a -e skb:kfree_skb
>
> shows `QDISC_OVERLIMIT` and `CAKE_FLOOD` as the drop reasons.


Cake drops packets from dequeue() while the qdisc spinlock is held,
unfortunately.

So it is quite possible that feeding more packets to the qdisc than before
enters a mode where dequeue() has to drop more packets and slow down
the whole thing.

Presumably cake enqueue() should 'drop' the packet when the queue is
under high pressure,
because enqueue() can drop the packet without holding the qdisc spinlock.


>
> `tc` statistics before/after the patch:
>
> Before patch:
> - drops: 32
> - packets: 4,786,109
> - memory_used: 8,916,480
> - requeues: 254
>
> After patch:
> - drops: 13,601,075
> - packets: 322,540
> - memory_used: 15,504,576
> - requeues: 273
>
> ---
>
> Call graph of `__dev_queue_xmit` after the patch (CPU time percentages):
>
> 53.37%  __dev_queue_xmit
>    21.02%  __qdisc_run
>      13.79%  sch_direct_xmit
>        12.01%  _raw_spin_lock
>          11.30%  do_raw_spin_lock
>            11.06%  __pv_queued_spin_lock_slowpath
>      0.73%  _raw_spin_unlock
>        0.58%  lock_release
>      0.69%  dev_hard_start_xmit
>      6.91%  cake_dequeue
>        1.82%  sk_skb_reason_drop
>          1.10%  skb_release_data
>          0.65%  kfree_skbmem
>            0.61%  kmem_cache_free
>        1.64%  get_random_u32
>        0.97%  ktime_get
>          0.86%  seqcount_lockdep_reader_access.constprop.0
>        0.91%  cake_dequeue_one
>    16.49%  _raw_spin_lock
>      15.71%  do_raw_spin_lock
>        15.54%  __pv_queued_spin_lock_slowpath
>    10.00%  dev_qdisc_enqueue
>      9.94%  cake_enqueue
>        4.90%  cake_hash
>        2.85%  __skb_flow_dissect
>          1.08%  lock_acquire
>          0.65%  lock_release
>        1.17%  __siphash_unaligned
>        2.20%  ktime_get
>          1.94%  seqcount_lockdep_reader_access.constprop.0
>        0.69%  cake_get_flow_quantum / get_random_u16
>    1.99%  netdev_core_pick_tx
>      1.79%  i40e_lan_select_queue
>      1.62%  netdev_pick_tx
>        0.78%  lock_acquire
>        0.52%  lock_release
>      0.82%  lock_acquire
>    0.76%  kfree_skb_list_reason
>      0.52%  skb_release_data
>    1.02%  lock_acquire
>      0.63%  lock_release
>
> ---
>
> The `_raw_spin_lock` portion under `__qdisc_run -> sch_direct_xmit` is slightly higher after the patch compared to before (from 5.68% to 12.01%).
> It feels like once sch_cake starts dropping packets it (due to overlimit and cobalt-drops) the throughput collapses. Could it be that the overlimit
> is reached "faster" when there are more CPUs trying to enqueue packets, thus reaching cake's queue limit due to the "batch" enqueue behavior,
> which then leads to cake starting to drop packets?
>

Yes, probably.