netdev - Re: [PATCH net-next V6 2/2] veth: apply qdisc backpressure on full ptr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <d36cb5a0-902c-4de5-bdd2-cbf9e1b1c7b1@kernel.org>
Date: Fri, 25 Apr 2025 15:55:52 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, tom@...bertland.com,
 Eric Dumazet <eric.dumazet@...il.com>, "David S. Miller"
 <davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>,
 Toke Høiland-Jørgensen <toke@...e.dk>,
 dsahern@...nel.org, makita.toshiaki@....ntt.co.jp,
 kernel-team@...udflare.com, phil@....cc
Subject: Re: [PATCH net-next V6 2/2] veth: apply qdisc backpressure on full
 ptr_ring to reduce TX drops

On 24/04/2025 17.53, Jakub Kicinski wrote:
> On Thu, 24 Apr 2025 17:24:51 +0200 Jesper Dangaard Brouer wrote:
>>> Looks like I wrote a reply to v5 but didn't hit send. But I may have
>>> set v5 to Changes Requested because of it :S Here is my comment:
>>>
>>>    I think this is missing a memory barrier. When drivers do this dance
>>>    there's usually a barrier between stop and recheck, to make sure the
>>>    stop is visible before we check. And vice versa veth_xdp_rcv() needs
>>>    to make sure other side sees the "empty" indication before it checks
>>>    if the queue is stopped.
>>
>> The call netif_tx_stop_queue(txq); already contains a memory barrier
>> smp_mb__before_atomic() plus an atomic set_bit operation.  That should
>> be sufficient.
> 
> That barrier is _before_ stopping the queue. I'm saying we need a
> barrier between stop and emptiness re-check. Note that:
>   - smp_mb__after_atomic() is enough, and it 'compiles' to nothing
>     on x86

I see, I will add a smp_mb__after_atomic() after netif_tx_stop_queue()
and send a V7.  I considered an atomic operation a full memory-barrier,
which I guess is correct for x86 (as you say this compiled to nothing),
but I guess other archs need this, so lets add it.

>   - all of this is the unlikely path :) You restart the qdisc
>     when the ptr ring is completely full so the stopping in absolute
>     worst case will happen once or twice per full ptr_ring ?
> 

Yes, basically. It should only happen once per full ptr_ring event.
As soon as TXQ is stopped, the driver code is no-longer called.
Do remember that remote CPU running veth_poll call, will (re)start the
TXQ again via qdisc layer, which call veth driver code again, e.g. race
to fill ptr_ring again and that will stop TXQ again. (Sysadm help: These
full/TXQ-stop events will be recorded in "requeues" counter by qdisc
stats).  The remote CPU running NAPI is in a fairly tight loop, so it
will do it's best to empty the queue, and have a total budget of 300.
The race is still very unlikely, but it is a race, that would stop the
TXQ forever for the veth device (we don't recover).

>> And the other side veth_poll(), have a smp_store_mb() before reading
>> ptr_ring.
>>
>> --Jesper
>>
>> p.s.
>> I actually had an alternative implementation of this, that only calls
>> stop when it is needed.  See below, it kind of looks prettier, but it
>> adds an extra memory barrier in the likely path. (And I'm not sure if
>> read memory barrier is strong enough).
> 
> Not sure that works either :S