netdev - Re: [PATCH v1 net-next 5/5] net: dev_queue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKhPJZGWUBD0-szseVyU6-UpLWP11ZG0=bmqtpgVGpQaw@mail.gmail.com>
Date: Sun, 9 Nov 2025 12:18:06 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Jonas Köppeler <j.koeppeler@...berlin.de>
Cc: Toke Høiland-Jørgensen <toke@...hat.com>, 
	"David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, 
	Simon Horman <horms@...nel.org>, Jamal Hadi Salim <jhs@...atatu.com>, 
	Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>, 
	Kuniyuki Iwashima <kuniyu@...gle.com>, Willem de Bruijn <willemb@...gle.com>, netdev@...r.kernel.org, 
	eric.dumazet@...il.com
Subject: Re: [PATCH v1 net-next 5/5] net: dev_queue_xmit() llist adoption

On Sun, Nov 9, 2025 at 11:28 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Sun, Nov 9, 2025 at 11:18 AM Jonas Köppeler <j.koeppeler@...berlin.de> wrote:
> >
> > On 11/9/25 5:33 PM, Toke Høiland-Jørgensen wrote:
> > > Not sure why there's this difference between your setup or mine; some
> > > .config or hardware difference related to the use of atomics? Any other
> > > ideas?
> >
> > Hi Eric, hi Toke,
> >
> > I observed a similar behavior where CAKE's throughput collapses after the patch.
> >
> > Test setup:
> > - 4 queues CAKE root qdisc
>
> Please send
>
> tc -s -d qd sh
>
>
> > - 64-byte packets at ~21 Mpps
> > - Intel Xeon Gold 6209U + 25GbE Intel XXV710 NIC
> > - DuT forwards incoming traffic back to traffic generator through cake
> >
> > Throughput over 10 seconds before/after patch:
> >
> > Before patch:
> > 0.475   mpps
> > 0.481   mpps
> > 0.477   mpps
> > 0.478   mpps
> > 0.478   mpps
> > 0.477   mpps
> > 0.479   mpps
> > 0.481   mpps
> > 0.481   mpps
> >
> > After patch:
> > 0.265  mpps
> > 0.035  mpps
> > 0.003  mpps
> > 0.002  mpps
> > 0.001  mpps
> > 0.002  mpps
> > 0.002  mpps
> > 0.002  mpps
> > 0.002  mpps
> >
> > ---
> >
> >
> >  From the qdisc I also see a large number of drops. Running:
> >
> >      perf record -a -e skb:kfree_skb
> >
> > shows `QDISC_OVERLIMIT` and `CAKE_FLOOD` as the drop reasons.
>
>
> Cake drops packets from dequeue() while the qdisc spinlock is held,
> unfortunately.
>
> So it is quite possible that feeding more packets to the qdisc than before
> enters a mode where dequeue() has to drop more packets and slow down
> the whole thing.
>
> Presumably cake enqueue() should 'drop' the packet when the queue is
> under high pressure,
> because enqueue() can drop the packet without holding the qdisc spinlock.
>
>
> >
> > `tc` statistics before/after the patch:
> >
> > Before patch:
> > - drops: 32
> > - packets: 4,786,109
> > - memory_used: 8,916,480
> > - requeues: 254
> >
> > After patch:
> > - drops: 13,601,075
> > - packets: 322,540
> > - memory_used: 15,504,576
> > - requeues: 273
> >
> > ---
> >
> > Call graph of `__dev_queue_xmit` after the patch (CPU time percentages):
> >
> > 53.37%  __dev_queue_xmit
> >    21.02%  __qdisc_run
> >      13.79%  sch_direct_xmit
> >        12.01%  _raw_spin_lock
> >          11.30%  do_raw_spin_lock
> >            11.06%  __pv_queued_spin_lock_slowpath
> >      0.73%  _raw_spin_unlock
> >        0.58%  lock_release
> >      0.69%  dev_hard_start_xmit
> >      6.91%  cake_dequeue
> >        1.82%  sk_skb_reason_drop
> >          1.10%  skb_release_data
> >          0.65%  kfree_skbmem
> >            0.61%  kmem_cache_free
> >        1.64%  get_random_u32
> >        0.97%  ktime_get
> >          0.86%  seqcount_lockdep_reader_access.constprop.0
> >        0.91%  cake_dequeue_one
> >    16.49%  _raw_spin_lock
> >      15.71%  do_raw_spin_lock
> >        15.54%  __pv_queued_spin_lock_slowpath
> >    10.00%  dev_qdisc_enqueue
> >      9.94%  cake_enqueue
> >        4.90%  cake_hash
> >        2.85%  __skb_flow_dissect
> >          1.08%  lock_acquire
> >          0.65%  lock_release
> >        1.17%  __siphash_unaligned
> >        2.20%  ktime_get
> >          1.94%  seqcount_lockdep_reader_access.constprop.0
> >        0.69%  cake_get_flow_quantum / get_random_u16
> >    1.99%  netdev_core_pick_tx
> >      1.79%  i40e_lan_select_queue
> >      1.62%  netdev_pick_tx
> >        0.78%  lock_acquire
> >        0.52%  lock_release
> >      0.82%  lock_acquire
> >    0.76%  kfree_skb_list_reason
> >      0.52%  skb_release_data
> >    1.02%  lock_acquire
> >      0.63%  lock_release
> >
> > ---
> >
> > The `_raw_spin_lock` portion under `__qdisc_run -> sch_direct_xmit` is slightly higher after the patch compared to before (from 5.68% to 12.01%).
> > It feels like once sch_cake starts dropping packets it (due to overlimit and cobalt-drops) the throughput collapses. Could it be that the overlimit
> > is reached "faster" when there are more CPUs trying to enqueue packets, thus reaching cake's queue limit due to the "batch" enqueue behavior,
> > which then leads to cake starting to drop packets?
> >
>
> Yes, probably.

I think the issue is really about TCQ_F_ONETXQUEUE :


Perhaps we should not accept q->limit packets in the ll_list, but a
much smaller limit.