[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87v7k0e8qz.fsf@toke.dk>
Date: Mon, 27 Oct 2025 15:09:56 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Jesper Dangaard Brouer <hawk@...nel.org>, netdev@...r.kernel.org,
makita.toshiaki@....ntt.co.jp
Cc: Eric Dumazet <eric.dumazet@...il.com>, "David S. Miller"
<davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
<pabeni@...hat.com>, ihor.solodrai@...ux.dev, toshiaki.makita1@...il.com,
bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
kernel-team@...udflare.com
Subject: Re: [PATCH net V1 1/3] veth: enable dev_watchdog for detecting
stalled TXQs
Jesper Dangaard Brouer <hawk@...nel.org> writes:
> On 24/10/2025 15.39, Toke Høiland-Jørgensen wrote:
>> Jesper Dangaard Brouer <hawk@...nel.org> writes:
>>
>>> The changes introduced in commit dc82a33297fc ("veth: apply qdisc
>>> backpressure on full ptr_ring to reduce TX drops") have been found to cause
>>> a race condition in production environments.
>>>
>>> Under specific circumstances, observed exclusively on ARM64 (aarch64)
>>> systems with Ampere Altra Max CPUs, a transmit queue (TXQ) can become
>>> permanently stalled. This happens when the race condition leads to the TXQ
>>> entering the QUEUE_STATE_DRV_XOFF state without a corresponding queue wake-up,
>>> preventing the attached qdisc from dequeueing packets and causing the
>>> network link to halt.
>>>
>>> As a first step towards resolving this issue, this patch introduces a
>>> failsafe mechanism. It enables the net device watchdog by setting a timeout
>>> value and implements the .ndo_tx_timeout callback.
>>>
>>> If a TXQ stalls, the watchdog will trigger the veth_tx_timeout() function,
>>> which logs a warning and calls netif_tx_wake_queue() to unstall the queue
>>> and allow traffic to resume.
>>>
>>> The log message will look like this:
>>>
>>> veth42: NETDEV WATCHDOG: CPU: 34: transmit queue 0 timed out 5393 ms
>>> veth42: veth backpressure stalled(n:1) TXQ(0) re-enable
>>>
>>> This provides a necessary recovery mechanism while the underlying race
>>> condition is investigated further. Subsequent patches will address the root
>>> cause and add more robust state handling in ndo_open/ndo_stop.
>>>
>>> Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
>>> Signed-off-by: Jesper Dangaard Brouer <hawk@...nel.org>
>>> ---
>>> drivers/net/veth.c | 16 +++++++++++++++-
>>> 1 file changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
>>> index a3046142cb8e..7b1a9805b270 100644
>>> --- a/drivers/net/veth.c
>>> +++ b/drivers/net/veth.c
>>> @@ -959,8 +959,10 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget,
>>> rq->stats.vs.xdp_packets += done;
>>> u64_stats_update_end(&rq->stats.syncp);
>>>
>>> - if (peer_txq && unlikely(netif_tx_queue_stopped(peer_txq)))
>>> + if (peer_txq && unlikely(netif_tx_queue_stopped(peer_txq))) {
>>> + txq_trans_cond_update(peer_txq);
>>> netif_tx_wake_queue(peer_txq);
>>> + }
>>
>> Hmm, seems a bit weird that this call to txq_trans_cond_update() is only
>> in veth_xdp_recv(). Shouldn't there (also?) be one in veth_xmit()?
>>
>
> The veth_xmit() call (indirectly) *do* update the txq_trans start
> timestamp, but only for return code NET_RX_SUCCESS / NETDEV_TX_OK.
> As .ndo_start_xmit = veth_xmit and netdev_start_xmit[1] will call
> txq_trans_update on NETDEV_TX_OK.
Ah, right; didn't think of checking the caller, thanks for the pointer :)
> This call to txq_trans_cond_update() isn't strictly necessary, as
> veth_xmit() call will update it later, and the netif_tx_stop_queue()
> call also updates trans_start.
>
> I primarily added it because other drivers that use BQL have their
> helper functions update txq_trans. As I see the veth implementation as
> a simplified BQL, that we hopefully can extend to become more dynamic
> like BQL.
>
> Do you prefer that I remove this? (call to txq_trans_cond_update)
Hmm, don't we need it for the XDP path? I.e., if there's no traffic
other than XDP_REDIRECT traffic, ndo_start_xmit() will not get called,
so we need some way other to keep the watchdog from firing, I think?
-Toke
Powered by blists - more mailing lists