netdev - Re: [PATCH v1 net-next 5/5] net: dev_queue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iJG=EszxP0GDd_jO9db6WxajRQ03gzVYZGF1CM8Dng90Q@mail.gmail.com>
Date: Sun, 9 Nov 2025 09:14:23 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Toke Høiland-Jørgensen <toke@...hat.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang <xiyou.wangcong@...il.com>, 
	Jiri Pirko <jiri@...nulli.us>, Kuniyuki Iwashima <kuniyu@...gle.com>, 
	Willem de Bruijn <willemb@...gle.com>, netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH v1 net-next 5/5] net: dev_queue_xmit() llist adoption

On Sun, Nov 9, 2025 at 8:33 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>
> Eric Dumazet <edumazet@...gle.com> writes:
>
> > On Sun, Nov 9, 2025 at 2:09 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >>
> >>
> >> This might be something related to XDP, because I ran the following
> >> test (IDPF, 32 TX queues)
> >>
> >> tc qd replace dev eth1 root cake
> >> ./super_netperf 16 -H tjbp27 -t UDP_STREAM -l 1000 -- -m 64 -Nn &
> >>
> >> Before my series : ~360 Kpps
> >> After my series : ~550 Kpps
> >
> > Or ... being faster uncovered an old qdisc bug.
> >
> > I mentioned the 'requeues' because I have seen this counter lately,
> > and was wondering if this could
> > be a driver bug.
> >
> > It seems the bug is in generic qdisc code: try_bulk_dequeue_skb() is
> > trusting BQL, but can not see the driver might block before BQL.
> >
> >  I am testing the following patch, it would be great if this solution
> > works for you.
>
> That does not seem to make any difference. I am not really seeing any
> requeues either, just a whole bunch of drops:
>
> qdisc cake 8001: dev ice0p1 root refcnt 37 bandwidth unlimited diffserv3 triple-isolate nonat nowash no-ack-filter split-gso rtt 100ms raw overhead 0
>  Sent 9633155852 bytes 13658545 pkt (dropped 36165260, overlimits 0 requeues 42)
>
> Tried with 16 netperf UDP_STREAMs instead of xdp-trafficgen, and with
> that it's even worse (down to less than 100 PPS). A single netperf
> instance gets me back to the ~600k PPS range, so definitely something to
> do with contention.
>
> The drops seem to come from mainly two places:
>
> # dropwatch -l kas
> Initializing kallsyms db
> dropwatch> start
> Enabling monitoring...
> Kernel monitoring activated.
> Issue Ctrl-C to stop monitoring
> 1 drops at __netif_receive_skb_core.constprop.0+160 (0xffffffff87272de0) [software]
> 2132 drops at __dev_xmit_skb+3f5 (0xffffffff8726d475) [software]
> 1 drops at skb_queue_purge_reason+100 (0xffffffff8724e130) [software]
> 52901 drops at __dev_xmit_skb+3f5 (0xffffffff8726d475) [software]
> 153583 drops at __dev_xmit_skb+13c (0xffffffff8726d1bc) [software]
> 1 drops at __netif_receive_skb_core.constprop.0+160 (0xffffffff87272de0) [software]
> 93968 drops at __dev_xmit_skb+3f5 (0xffffffff8726d475) [software]
> 212982 drops at __dev_xmit_skb+13c (0xffffffff8726d1bc) [software]
> 239359 drops at __dev_xmit_skb+13c (0xffffffff8726d1bc) [software]
> 108219 drops at __dev_xmit_skb+3f5 (0xffffffff8726d475) [software]
> 191163 drops at __dev_xmit_skb+13c (0xffffffff8726d1bc) [software]
> 93300 drops at __dev_xmit_skb+3f5 (0xffffffff8726d475) [software]
> 131201 drops at __dev_xmit_skb+13c (0xffffffff8726d1bc) [software]
>
> +13c corresponds to the defer_count check in your patch:
>
>                         defer_count = atomic_long_inc_return(&q->defer_count);
>                         if (unlikely(defer_count > q->limit)) {
>                                 kfree_skb_reason(skb, SKB_DROP_REASON_QDISC_DROP);
>                                 return NET_XMIT_DROP;
>                         }
>
> and +3f5 is the to_free drop at the end of the function:
>
> unlock:
>         spin_unlock(root_lock);
>         if (unlikely(to_free))
>                 kfree_skb_list_reason(to_free,
>                                       tcf_get_drop_reason(to_free));
>
> Not sure why there's this difference between your setup or mine; some
> .config or hardware difference related to the use of atomics? Any other
> ideas?

I would think atomics do not depend on .config.

Hmmm... maybe some CONFIG_PREEMPT_RT stuff ?

Cpu who won the race can not make progress for some reason.

Qdisc can be restarted from ksoftirqd, and it might compete with your
user threads,
because why not :)