linux-kernel - Re: [PATCH net V1 1/3] veth: enable dev

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b6d13746-7921-4825-97cc-7136cdccafde@kernel.org>
Date: Mon, 27 Oct 2025 12:41:36 +0100
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Toke Høiland-Jørgensen <toke@...hat.com>,
 netdev@...r.kernel.org, makita.toshiaki@....ntt.co.jp
Cc: Eric Dumazet <eric.dumazet@...il.com>,
 "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, ihor.solodrai@...ux.dev,
 toshiaki.makita1@...il.com, bpf@...r.kernel.org,
 linux-kernel@...r.kernel.org, kernel-team@...udflare.com
Subject: Re: [PATCH net V1 1/3] veth: enable dev_watchdog for detecting
 stalled TXQs

On 24/10/2025 15.39, Toke Høiland-Jørgensen wrote:
> Jesper Dangaard Brouer <hawk@...nel.org> writes:
> 
>> The changes introduced in commit dc82a33297fc ("veth: apply qdisc
>> backpressure on full ptr_ring to reduce TX drops") have been found to cause
>> a race condition in production environments.
>>
>> Under specific circumstances, observed exclusively on ARM64 (aarch64)
>> systems with Ampere Altra Max CPUs, a transmit queue (TXQ) can become
>> permanently stalled. This happens when the race condition leads to the TXQ
>> entering the QUEUE_STATE_DRV_XOFF state without a corresponding queue wake-up,
>> preventing the attached qdisc from dequeueing packets and causing the
>> network link to halt.
>>
>> As a first step towards resolving this issue, this patch introduces a
>> failsafe mechanism. It enables the net device watchdog by setting a timeout
>> value and implements the .ndo_tx_timeout callback.
>>
>> If a TXQ stalls, the watchdog will trigger the veth_tx_timeout() function,
>> which logs a warning and calls netif_tx_wake_queue() to unstall the queue
>> and allow traffic to resume.
>>
>> The log message will look like this:
>>
>>   veth42: NETDEV WATCHDOG: CPU: 34: transmit queue 0 timed out 5393 ms
>>   veth42: veth backpressure stalled(n:1) TXQ(0) re-enable
>>
>> This provides a necessary recovery mechanism while the underlying race
>> condition is investigated further. Subsequent patches will address the root
>> cause and add more robust state handling in ndo_open/ndo_stop.
>>
>> Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
>> Signed-off-by: Jesper Dangaard Brouer <hawk@...nel.org>
>> ---
>>   drivers/net/veth.c |   16 +++++++++++++++-
>>   1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
>> index a3046142cb8e..7b1a9805b270 100644
>> --- a/drivers/net/veth.c
>> +++ b/drivers/net/veth.c
>> @@ -959,8 +959,10 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget,
>>   	rq->stats.vs.xdp_packets += done;
>>   	u64_stats_update_end(&rq->stats.syncp);
>>   
>> -	if (peer_txq && unlikely(netif_tx_queue_stopped(peer_txq)))
>> +	if (peer_txq && unlikely(netif_tx_queue_stopped(peer_txq))) {
>> +		txq_trans_cond_update(peer_txq);
>>   		netif_tx_wake_queue(peer_txq);
>> +	}
> 
> Hmm, seems a bit weird that this call to txq_trans_cond_update() is only
> in veth_xdp_recv(). Shouldn't there (also?) be one in veth_xmit()?
> 

The veth_xmit() call (indirectly) *do* update the txq_trans start
timestamp, but only for return code NET_RX_SUCCESS / NETDEV_TX_OK.
As .ndo_start_xmit = veth_xmit and netdev_start_xmit[1] will call
txq_trans_update on NETDEV_TX_OK.

This call to txq_trans_cond_update() isn't strictly necessary, as
veth_xmit() call will update it later, and the netif_tx_stop_queue()
call also updates trans_start.

I primarily added it because other drivers that use BQL have their
helper functions update txq_trans.  As I see the veth implementation as
a simplified BQL, that we hopefully can extend to become more dynamic
like BQL.

Do you prefer that I remove this?  (call to txq_trans_cond_update)

--Jesper

[1] 
https://elixir.bootlin.com/linux/v6.17.5/source/include/linux/netdevice.h#L5222-L5233