netdev - Re: [PATCH v1 net-next 5/5] net: dev_queue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87seemm8eb.fsf@toke.dk>
Date: Mon, 10 Nov 2025 12:31:08 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Eric Dumazet <edumazet@...gle.com>, Jonas Köppeler
 <j.koeppeler@...berlin.de>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski
 <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman
 <horms@...nel.org>, Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang
 <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>, Kuniyuki
 Iwashima <kuniyu@...gle.com>, Willem de Bruijn <willemb@...gle.com>,
 netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH v1 net-next 5/5] net: dev_queue_xmit() llist adoption

Eric Dumazet <edumazet@...gle.com> writes:

> On Sun, Nov 9, 2025 at 12:18 PM Eric Dumazet <edumazet@...gle.com> wrote:
>>
>
>> I think the issue is really about TCQ_F_ONETXQUEUE :
>
> dequeue_skb() can only dequeue 8 packets at a time, then has to
> release the qdisc spinlock.

So after looking at this a bit more, I think I understand more or less
what's going on in the interaction between cake and your llist patch:

Basically, the llist patch moves the bottleneck from qdisc enqueue to
qdisc dequeue (in this setup that we're testing where the actual link
speed is not itself a bottleneck). Before, enqueue contends with dequeue
on the qdisc lock, meaning dequeue has no trouble keeping up, and the
qdisc never fills up.

With the llist patch, suddenly we're enqueueing a whole batch of packets
every time we take the lock, which means that dequeue can no longer keep
up, making it the bottleneck.

The complete collapse in throughput comes from the way cake deals with
unresponsive flows once the qdisc fills up: the BLUE part of its AQM
will drive up its drop probability to 1, where it will stay until the
flow responds (which, in this case, it never does).

Turning off the BLUE algorithm prevents the throughput collapse; there's
still a delta compared to a stock 6.17 kernel, which I think is because
cake is simply quite inefficient at dropping packets in an overload
situation. I'll experiment with a variant of the bulk dropping you
introduced to fq_codel and see if that helps. We should probably also
cap the drop probability of BLUE to something lower than 1.

The patch you sent (below) does not in itself help anything, but
lowering the constant to to 8 instead of 256 does help. I'm not sure
we want something that low, though; probably better to fix the behaviour
of cake, no?

-Toke

>> Perhaps we should not accept q->limit packets in the ll_list, but a
>> much smaller limit.
>
> I will test something like this
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 69515edd17bc6a157046f31b3dd343a59ae192ab..e4187e2ca6324781216c073de2ec20626119327a
> 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4185,8 +4185,12 @@ static inline int __dev_xmit_skb(struct sk_buff
> *skb, struct Qdisc *q,
>         first_n = READ_ONCE(q->defer_list.first);
>         do {
>                 if (first_n && !defer_count) {
> +                       unsigned long total;
> +
>                         defer_count = atomic_long_inc_return(&q->defer_count);
> -                       if (unlikely(defer_count > q->limit)) {
> +                       total = defer_count + READ_ONCE(q->q.qlen);
> +
> +                       if (unlikely(defer_count > 256 || total >
> READ_ONCE(q->limit))) {
>                                 kfree_skb_reason(skb,
> SKB_DROP_REASON_QDISC_DROP);
>                                 return NET_XMIT_DROP;
>                         }