[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKhPJZGWUBD0-szseVyU6-UpLWP11ZG0=bmqtpgVGpQaw@mail.gmail.com>
Date: Sun, 9 Nov 2025 12:18:06 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Jonas Köppeler <j.koeppeler@...berlin.de>
Cc: Toke Høiland-Jørgensen <toke@...hat.com>,
"David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>, Jamal Hadi Salim <jhs@...atatu.com>,
Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>,
Kuniyuki Iwashima <kuniyu@...gle.com>, Willem de Bruijn <willemb@...gle.com>, netdev@...r.kernel.org,
eric.dumazet@...il.com
Subject: Re: [PATCH v1 net-next 5/5] net: dev_queue_xmit() llist adoption
On Sun, Nov 9, 2025 at 11:28 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Sun, Nov 9, 2025 at 11:18 AM Jonas Köppeler <j.koeppeler@...berlin.de> wrote:
> >
> > On 11/9/25 5:33 PM, Toke Høiland-Jørgensen wrote:
> > > Not sure why there's this difference between your setup or mine; some
> > > .config or hardware difference related to the use of atomics? Any other
> > > ideas?
> >
> > Hi Eric, hi Toke,
> >
> > I observed a similar behavior where CAKE's throughput collapses after the patch.
> >
> > Test setup:
> > - 4 queues CAKE root qdisc
>
> Please send
>
> tc -s -d qd sh
>
>
> > - 64-byte packets at ~21 Mpps
> > - Intel Xeon Gold 6209U + 25GbE Intel XXV710 NIC
> > - DuT forwards incoming traffic back to traffic generator through cake
> >
> > Throughput over 10 seconds before/after patch:
> >
> > Before patch:
> > 0.475 mpps
> > 0.481 mpps
> > 0.477 mpps
> > 0.478 mpps
> > 0.478 mpps
> > 0.477 mpps
> > 0.479 mpps
> > 0.481 mpps
> > 0.481 mpps
> >
> > After patch:
> > 0.265 mpps
> > 0.035 mpps
> > 0.003 mpps
> > 0.002 mpps
> > 0.001 mpps
> > 0.002 mpps
> > 0.002 mpps
> > 0.002 mpps
> > 0.002 mpps
> >
> > ---
> >
> >
> > From the qdisc I also see a large number of drops. Running:
> >
> > perf record -a -e skb:kfree_skb
> >
> > shows `QDISC_OVERLIMIT` and `CAKE_FLOOD` as the drop reasons.
>
>
> Cake drops packets from dequeue() while the qdisc spinlock is held,
> unfortunately.
>
> So it is quite possible that feeding more packets to the qdisc than before
> enters a mode where dequeue() has to drop more packets and slow down
> the whole thing.
>
> Presumably cake enqueue() should 'drop' the packet when the queue is
> under high pressure,
> because enqueue() can drop the packet without holding the qdisc spinlock.
>
>
> >
> > `tc` statistics before/after the patch:
> >
> > Before patch:
> > - drops: 32
> > - packets: 4,786,109
> > - memory_used: 8,916,480
> > - requeues: 254
> >
> > After patch:
> > - drops: 13,601,075
> > - packets: 322,540
> > - memory_used: 15,504,576
> > - requeues: 273
> >
> > ---
> >
> > Call graph of `__dev_queue_xmit` after the patch (CPU time percentages):
> >
> > 53.37% __dev_queue_xmit
> > 21.02% __qdisc_run
> > 13.79% sch_direct_xmit
> > 12.01% _raw_spin_lock
> > 11.30% do_raw_spin_lock
> > 11.06% __pv_queued_spin_lock_slowpath
> > 0.73% _raw_spin_unlock
> > 0.58% lock_release
> > 0.69% dev_hard_start_xmit
> > 6.91% cake_dequeue
> > 1.82% sk_skb_reason_drop
> > 1.10% skb_release_data
> > 0.65% kfree_skbmem
> > 0.61% kmem_cache_free
> > 1.64% get_random_u32
> > 0.97% ktime_get
> > 0.86% seqcount_lockdep_reader_access.constprop.0
> > 0.91% cake_dequeue_one
> > 16.49% _raw_spin_lock
> > 15.71% do_raw_spin_lock
> > 15.54% __pv_queued_spin_lock_slowpath
> > 10.00% dev_qdisc_enqueue
> > 9.94% cake_enqueue
> > 4.90% cake_hash
> > 2.85% __skb_flow_dissect
> > 1.08% lock_acquire
> > 0.65% lock_release
> > 1.17% __siphash_unaligned
> > 2.20% ktime_get
> > 1.94% seqcount_lockdep_reader_access.constprop.0
> > 0.69% cake_get_flow_quantum / get_random_u16
> > 1.99% netdev_core_pick_tx
> > 1.79% i40e_lan_select_queue
> > 1.62% netdev_pick_tx
> > 0.78% lock_acquire
> > 0.52% lock_release
> > 0.82% lock_acquire
> > 0.76% kfree_skb_list_reason
> > 0.52% skb_release_data
> > 1.02% lock_acquire
> > 0.63% lock_release
> >
> > ---
> >
> > The `_raw_spin_lock` portion under `__qdisc_run -> sch_direct_xmit` is slightly higher after the patch compared to before (from 5.68% to 12.01%).
> > It feels like once sch_cake starts dropping packets it (due to overlimit and cobalt-drops) the throughput collapses. Could it be that the overlimit
> > is reached "faster" when there are more CPUs trying to enqueue packets, thus reaching cake's queue limit due to the "batch" enqueue behavior,
> > which then leads to cake starting to drop packets?
> >
>
> Yes, probably.
I think the issue is really about TCQ_F_ONETXQUEUE :
Perhaps we should not accept q->limit packets in the ll_list, but a
much smaller limit.
Powered by blists - more mailing lists