netdev - Re: [PATCH net V2 2/2] veth: more robust handing of race to avoid txq getting stuck

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <27e74aeb-89f5-4547-8ecc-232570e2644c@kernel.org>
Date: Wed, 29 Oct 2025 11:33:23 +0100
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Toshiaki Makita <toshiaki.makita1@...il.com>
Cc: Eric Dumazet <eric.dumazet@...il.com>,
 "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, ihor.solodrai@...ux.dev,
 "Michael S. Tsirkin" <mst@...hat.com>, makita.toshiaki@....ntt.co.jp,
 bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-arm-kernel@...ts.infradead.org, kernel-team@...udflare.com,
 netdev@...r.kernel.org, Toke Høiland-Jørgensen
 <toke@...e.dk>
Subject: Re: [PATCH net V2 2/2] veth: more robust handing of race to avoid txq
 getting stuck

On 28/10/2025 15.56, Toshiaki Makita wrote:
> On 2025/10/28 5:05, Jesper Dangaard Brouer wrote:
> 
>> (1) In veth_xmit(), the racy conditional wake-up logic and its memory 
>> barrier
>> are removed. Instead, after stopping the queue, we unconditionally call
>> __veth_xdp_flush(rq). This guarantees that the NAPI consumer is 
>> scheduled,
>> making it solely responsible for re-waking the TXQ.
> 
> Maybe another option is to use !ptr_ring_full() instead of 
> ptr_ring_empty()?

Nope, that will not work.
I think MST will agree.

> I'm not sure which is better. Anyway I'm ok with your approach.
> 
> ...
> 
>> (3) Finally, the NAPI completion check in veth_poll() is updated. If NAPI is
>> about to complete (napi_complete_done), it now also checks if the peer TXQ
>> is stopped. If the ring is empty but the peer TXQ is stopped, NAPI will
>> reschedule itself. This prevents a new race where the producer stops the
>> queue just as the consumer is finishing its poll, ensuring the wakeup 
>> is not missed.
> ...
> 
>> @@ -986,7 +979,8 @@ static int veth_poll(struct napi_struct *napi, int 
>> budget)
>>       if (done < budget && napi_complete_done(napi, done)) {
>>           /* Write rx_notify_masked before reading ptr_ring */
>>           smp_store_mb(rq->rx_notify_masked, false);
>> -        if (unlikely(!__ptr_ring_empty(&rq->xdp_ring))) {
>> +        if (unlikely(!__ptr_ring_empty(&rq->xdp_ring) ||
>> +                 (peer_txq && netif_tx_queue_stopped(peer_txq)))) {
> 
> Not sure if this is necessary.

How sure are you that this isn't necessary?

>  From commitlog, your intention seems to be making sure to wake up the 
> queue,
> but you wake up the queue immediately after this hunk in the same function,
> so isn't it guaranteed without scheduling another napi?
> 

The above code catches the case, where the ptr_ring is empty and the
tx_queue is stopped.  It feels wrong not to reach in this case, but you
*might* be right that it isn't strictly necessary, because below code
will also call netif_tx_wake_queue() which *should* have a SKB stored
that will *indirectly* trigger a restart of the NAPI.

I will stare some more at the code to see if I can convince myself that
we don't have to catch this case.

Please, also provide "How sure are you that this isn't necessary?"

>>               if (napi_schedule_prep(&rq->xdp_napi)) {
>>                   WRITE_ONCE(rq->rx_notify_masked, true);
>>                   __napi_schedule(&rq->xdp_napi);
>> @@ -998,6 +992,13 @@ static int veth_poll(struct napi_struct *napi, 
>> int budget)
>>           veth_xdp_flush(rq, &bq);
>>       xdp_clear_return_frame_no_direct();
>> +    /* Release backpressure per NAPI poll */
>> +    smp_rmb(); /* Paired with netif_tx_stop_queue set_bit */
>> +    if (peer_txq && netif_tx_queue_stopped(peer_txq)) {
>> +        txq_trans_cond_update(peer_txq);
>> +        netif_tx_wake_queue(peer_txq);
>> +    }
>> +
>>       return done;
>>   }
> 
> -- 
> Toshiaki Makita