[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d36cb5a0-902c-4de5-bdd2-cbf9e1b1c7b1@kernel.org>
Date: Fri, 25 Apr 2025 15:55:52 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, tom@...bertland.com,
Eric Dumazet <eric.dumazet@...il.com>, "David S. Miller"
<davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>,
Toke Høiland-Jørgensen <toke@...e.dk>,
dsahern@...nel.org, makita.toshiaki@....ntt.co.jp,
kernel-team@...udflare.com, phil@....cc
Subject: Re: [PATCH net-next V6 2/2] veth: apply qdisc backpressure on full
ptr_ring to reduce TX drops
On 24/04/2025 17.53, Jakub Kicinski wrote:
> On Thu, 24 Apr 2025 17:24:51 +0200 Jesper Dangaard Brouer wrote:
>>> Looks like I wrote a reply to v5 but didn't hit send. But I may have
>>> set v5 to Changes Requested because of it :S Here is my comment:
>>>
>>> I think this is missing a memory barrier. When drivers do this dance
>>> there's usually a barrier between stop and recheck, to make sure the
>>> stop is visible before we check. And vice versa veth_xdp_rcv() needs
>>> to make sure other side sees the "empty" indication before it checks
>>> if the queue is stopped.
>>
>> The call netif_tx_stop_queue(txq); already contains a memory barrier
>> smp_mb__before_atomic() plus an atomic set_bit operation. That should
>> be sufficient.
>
> That barrier is _before_ stopping the queue. I'm saying we need a
> barrier between stop and emptiness re-check. Note that:
> - smp_mb__after_atomic() is enough, and it 'compiles' to nothing
> on x86
I see, I will add a smp_mb__after_atomic() after netif_tx_stop_queue()
and send a V7. I considered an atomic operation a full memory-barrier,
which I guess is correct for x86 (as you say this compiled to nothing),
but I guess other archs need this, so lets add it.
> - all of this is the unlikely path :) You restart the qdisc
> when the ptr ring is completely full so the stopping in absolute
> worst case will happen once or twice per full ptr_ring ?
>
Yes, basically. It should only happen once per full ptr_ring event.
As soon as TXQ is stopped, the driver code is no-longer called.
Do remember that remote CPU running veth_poll call, will (re)start the
TXQ again via qdisc layer, which call veth driver code again, e.g. race
to fill ptr_ring again and that will stop TXQ again. (Sysadm help: These
full/TXQ-stop events will be recorded in "requeues" counter by qdisc
stats). The remote CPU running NAPI is in a fairly tight loop, so it
will do it's best to empty the queue, and have a total budget of 300.
The race is still very unlikely, but it is a race, that would stop the
TXQ forever for the veth device (we don't recover).
>> And the other side veth_poll(), have a smp_store_mb() before reading
>> ptr_ring.
>>
>> --Jesper
>>
>> p.s.
>> I actually had an alternative implementation of this, that only calls
>> stop when it is needed. See below, it kind of looks prettier, but it
>> adds an extra memory barrier in the likely path. (And I'm not sure if
>> read memory barrier is strong enough).
>
> Not sure that works either :S
Powered by blists - more mailing lists