[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5f19b555-b0fa-472a-a5f3-6673c0b69c5c@hetzner-cloud.de>
Date: Fri, 6 Jun 2025 00:17:37 +0200
From: Marcus Wichelmann <marcus.wichelmann@...zner-cloud.de>
To: Eric Dumazet <edumazet@...gle.com>,
Toke Høiland-Jørgensen <toke@...hat.com>
Cc: Jesper Dangaard Brouer <hawk@...nel.org>, bpf@...r.kernel.org,
netdev@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
John Fastabend <john.fastabend@...il.com>,
Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller"
<davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Jamal Hadi Salim <jhs@...atatu.com>,
Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>,
linux-kernel@...r.kernel.org
Subject: Re: [BUG] veth: TX drops with NAPI enabled and crash in combination
with qdisc
Am 06.06.25 um 00:11 schrieb Eric Dumazet:
> On Thu, Jun 5, 2025 at 9:46 AM Eric Dumazet <edumazet@...gle.com> wrote:
>>
>> On Thu, Jun 5, 2025 at 9:15 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>>>
>>> Marcus Wichelmann <marcus.wichelmann@...zner-cloud.de> writes:
>>>
>>>> Hi,
>>>>
>>>> while experimenting with XDP_REDIRECT from a veth-pair to another interface, I
>>>> noticed that the veth-pair looses lots of packets when multiple TCP streams go
>>>> through it, resulting in stalling TCP connections and noticeable instabilities.
>>>>
>>>> This doesn't seem to be an issue with just XDP but rather occurs whenever the
>>>> NAPI mode of the veth driver is active.
>>>> I managed to reproduce the same behavior just by bringing the veth-pair into
>>>> NAPI mode (see commit d3256efd8e8b ("veth: allow enabling NAPI even without
>>>> XDP")) and running multiple TCP streams through it using a network namespace.
>>>>
>>>> Here is how I reproduced it:
>>>>
>>>> ip netns add lb
>>>> ip link add dev to-lb type veth peer name in-lb netns lb
>>>>
>>>> # Enable NAPI
>>>> ethtool -K to-lb gro on
>>>> ethtool -K to-lb tso off
>>>> ip netns exec lb ethtool -K in-lb gro on
>>>> ip netns exec lb ethtool -K in-lb tso off
>>>>
>>>> ip link set dev to-lb up
>>>> ip -netns lb link set dev in-lb up
>>>>
>>>> Then run a HTTP server inside the "lb" namespace that serves a large file:
>>>>
>>>> fallocate -l 10G testfiles/10GB.bin
>>>> caddy file-server --root testfiles/
>>>>
>>>> Download this file from within the root namespace multiple times in parallel:
>>>>
>>>> curl http://[fe80::...%to-lb]/10GB.bin -o /dev/null
>>>>
>>>> In my tests, I ran four parallel curls at the same time and after just a few
>>>> seconds, three of them stalled while the other one "won" over the full bandwidth
>>>> and completed the download.
>>>>
>>>> This is probably a result of the veth's ptr_ring running full, causing many
>>>> packet drops on TX, and the TCP congestion control reacting to that.
>>>>
>>>> In this context, I also took notice of Jesper's patch which describes a very
>>>> similar issue and should help to resolve this:
>>>> commit dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to
>>>> reduce TX drops")
>>>>
>>>> But when repeating the above test with latest mainline, which includes this
>>>> patch, and enabling qdisc via
>>>> tc qdisc add dev in-lb root sfq perturb 10
>>>> the Kernel crashed just after starting the second TCP stream (see output below).
>>>>
>>>> So I have two questions:
>>>> - Is my understanding of the described issue correct and is Jesper's patch
>>>> sufficient to solve this?
>>>
>>> Hmm, yeah, this does sound likely.
>>>
>>>> - Is my qdisc configuration to make use of this patch correct and the kernel
>>>> crash is likely a bug?
>>>>
>>>> ------------[ cut here ]------------
>>>> UBSAN: array-index-out-of-bounds in net/sched/sch_sfq.c:203:12
>>>> index 65535 is out of range for type 'sfq_head [128]'
>>>
>>> This (the 'index 65535') kinda screams "integer underflow". So certainly
>>> looks like a kernel bug, yeah. Don't see any obvious reason why Jesper's
>>> patch would trigger this; maybe Eric has an idea?
>>>
>>> Does this happen with other qdiscs as well, or is it specific to sfq?
>>
>> This seems like a bug in sfq, we already had recent fixes in it, and
>> other fixes in net/sched vs qdisc_tree_reduce_backlog()
>>
>> It is possible qdisc_pkt_len() could be wrong in this use case (TSO off ?)
>
> This seems to be a very old bug, indeed caused by sch->gso_skb
> contribution to sch->q.qlen
>
> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> index b912ad99aa15d95b297fb28d0fd0baa9c21ab5cd..77fa02f2bfcd56a36815199aa2e7987943ea226f
> 100644
> --- a/net/sched/sch_sfq.c
> +++ b/net/sched/sch_sfq.c
> @@ -310,7 +310,10 @@ static unsigned int sfq_drop(struct Qdisc *sch,
> struct sk_buff **to_free)
> /* It is difficult to believe, but ALL THE SLOTS HAVE
> LENGTH 1. */
> x = q->tail->next;
> slot = &q->slots[x];
> - q->tail->next = slot->next;
> + if (slot->next == x)
> + q->tail = NULL; /* no more active slots */
> + else
> + q->tail->next = slot->next;
> q->ht[slot->hash] = SFQ_EMPTY_SLOT;
> goto drop;
> }
>
Hi,
thank you for looking into it.
I'll give your patch a try and will also do tests with other qdiscs as well when I'm back
in office.
Marcus
Powered by blists - more mailing lists