[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3d520952-c343-4ae5-98c0-c9965dc7e320@hetzner-cloud.de>
Date: Tue, 10 Jun 2025 16:40:43 +0200
From: Marcus Wichelmann <marcus.wichelmann@...zner-cloud.de>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Toke Høiland-Jørgensen <toke@...hat.com>,
Jesper Dangaard Brouer <hawk@...nel.org>, bpf@...r.kernel.org,
netdev@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
John Fastabend <john.fastabend@...il.com>,
Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller"
<davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Jamal Hadi Salim <jhs@...atatu.com>,
Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>,
linux-kernel@...r.kernel.org
Subject: Re: [BUG] veth: TX drops with NAPI enabled and crash in combination
with qdisc
Am 06.06.25 um 11:06 schrieb Eric Dumazet:
> On Thu, Jun 5, 2025 at 3:17 PM Marcus Wichelmann
> <marcus.wichelmann@...zner-cloud.de> wrote:
>>
>> Am 06.06.25 um 00:11 schrieb Eric Dumazet:
>>> On Thu, Jun 5, 2025 at 9:46 AM Eric Dumazet <edumazet@...gle.com> wrote:
>>>>
>>>> On Thu, Jun 5, 2025 at 9:15 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>>>>>
>>>>> Marcus Wichelmann <marcus.wichelmann@...zner-cloud.de> writes:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> while experimenting with XDP_REDIRECT from a veth-pair to another interface, I
>>>>>> noticed that the veth-pair looses lots of packets when multiple TCP streams go
>>>>>> through it, resulting in stalling TCP connections and noticeable instabilities.
>>>>>>
>>>>>> This doesn't seem to be an issue with just XDP but rather occurs whenever the
>>>>>> NAPI mode of the veth driver is active.
>>>>>> I managed to reproduce the same behavior just by bringing the veth-pair into
>>>>>> NAPI mode (see commit d3256efd8e8b ("veth: allow enabling NAPI even without
>>>>>> XDP")) and running multiple TCP streams through it using a network namespace.
>>>>>>
>>>>>> Here is how I reproduced it:
>>>>>>
>>>>>> ip netns add lb
>>>>>> ip link add dev to-lb type veth peer name in-lb netns lb
>>>>>>
>>>>>> # Enable NAPI
>>>>>> ethtool -K to-lb gro on
>>>>>> ethtool -K to-lb tso off
>>>>>> ip netns exec lb ethtool -K in-lb gro on
>>>>>> ip netns exec lb ethtool -K in-lb tso off
>>>>>>
>>>>>> ip link set dev to-lb up
>>>>>> ip -netns lb link set dev in-lb up
>>>>>>
>>>>>> Then run a HTTP server inside the "lb" namespace that serves a large file:
>>>>>>
>>>>>> fallocate -l 10G testfiles/10GB.bin
>>>>>> caddy file-server --root testfiles/
>>>>>>
>>>>>> Download this file from within the root namespace multiple times in parallel:
>>>>>>
>>>>>> curl http://[fe80::...%to-lb]/10GB.bin -o /dev/null
>>>>>>
>>>>>> In my tests, I ran four parallel curls at the same time and after just a few
>>>>>> seconds, three of them stalled while the other one "won" over the full bandwidth
>>>>>> and completed the download.
>>>>>>
>>>>>> This is probably a result of the veth's ptr_ring running full, causing many
>>>>>> packet drops on TX, and the TCP congestion control reacting to that.
>>>>>>
>>>>>> In this context, I also took notice of Jesper's patch which describes a very
>>>>>> similar issue and should help to resolve this:
>>>>>> commit dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to
>>>>>> reduce TX drops")
>>>>>>
>>>>>> But when repeating the above test with latest mainline, which includes this
>>>>>> patch, and enabling qdisc via
>>>>>> tc qdisc add dev in-lb root sfq perturb 10
>>>>>> the Kernel crashed just after starting the second TCP stream (see output below).
>>>>>>
>>>>>> So I have two questions:
>>>>>> - Is my understanding of the described issue correct and is Jesper's patch
>>>>>> sufficient to solve this?
>>>>>
>>>>> Hmm, yeah, this does sound likely.
>>>>>
>>>>>> - Is my qdisc configuration to make use of this patch correct and the kernel
>>>>>> crash is likely a bug?
>>>>>>
>>>>>> ------------[ cut here ]------------
>>>>>> UBSAN: array-index-out-of-bounds in net/sched/sch_sfq.c:203:12
>>>>>> index 65535 is out of range for type 'sfq_head [128]'
>>>>>
>>>>> This (the 'index 65535') kinda screams "integer underflow". So certainly
>>>>> looks like a kernel bug, yeah. Don't see any obvious reason why Jesper's
>>>>> patch would trigger this; maybe Eric has an idea?
>>>>>
>>>>> Does this happen with other qdiscs as well, or is it specific to sfq?
>>>>
>>>> This seems like a bug in sfq, we already had recent fixes in it, and
>>>> other fixes in net/sched vs qdisc_tree_reduce_backlog()
>>>>
>>>> It is possible qdisc_pkt_len() could be wrong in this use case (TSO off ?)
>>>
>>> This seems to be a very old bug, indeed caused by sch->gso_skb
>>> contribution to sch->q.qlen
>>>
>>> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
>>> index b912ad99aa15d95b297fb28d0fd0baa9c21ab5cd..77fa02f2bfcd56a36815199aa2e7987943ea226f
>>> 100644
>>> --- a/net/sched/sch_sfq.c
>>> +++ b/net/sched/sch_sfq.c
>>> @@ -310,7 +310,10 @@ static unsigned int sfq_drop(struct Qdisc *sch,
>>> struct sk_buff **to_free)
>>> /* It is difficult to believe, but ALL THE SLOTS HAVE
>>> LENGTH 1. */
>>> x = q->tail->next;
>>> slot = &q->slots[x];
>>> - q->tail->next = slot->next;
>>> + if (slot->next == x)
>>> + q->tail = NULL; /* no more active slots */
>>> + else
>>> + q->tail->next = slot->next;
>>> q->ht[slot->hash] = SFQ_EMPTY_SLOT;
>>> goto drop;
>>> }
>>>
>>
>> Hi,
>>
>> thank you for looking into it.
>> I'll give your patch a try and will also do tests with other qdiscs as well when I'm back
>> in office.
>>
>
> I have been using this repro :
>
> [...]
Hi,
I can confirm that the sfq qdisc is now stable in this setup, thanks to your fix.
I also experimented with other qdiscs and fq_codel works as well.
The sfq/fq_codel qdisc works hand-in-hand now with Jesper's patch to resolve the original
issue. Multiple TCP connections run very stable, even when NAPI/XDP is active on the veth
device, and I can see that the packets are being requeued instead of being dropped in the
veth driver.
Thank you for your help!
Marcus
Powered by blists - more mailing lists