[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <176123157173.2281302.7040578942230212638.stgit@firesoul>
Date: Thu, 23 Oct 2025 16:59:31 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: netdev@...r.kernel.org, makita.toshiaki@....ntt.co.jp
Cc: Jesper Dangaard Brouer <hawk@...nel.org>,
Eric Dumazet <eric.dumazet@...il.com>,
"David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, ihor.solodrai@...ux.dev,
toshiaki.makita1@...il.com, bpf@...r.kernel.org,
linux-kernel@...r.kernel.org, kernel-team@...udflare.com
Subject: [PATCH net V1 1/3] veth: enable dev_watchdog for detecting stalled
TXQs
The changes introduced in commit dc82a33297fc ("veth: apply qdisc
backpressure on full ptr_ring to reduce TX drops") have been found to cause
a race condition in production environments.
Under specific circumstances, observed exclusively on ARM64 (aarch64)
systems with Ampere Altra Max CPUs, a transmit queue (TXQ) can become
permanently stalled. This happens when the race condition leads to the TXQ
entering the QUEUE_STATE_DRV_XOFF state without a corresponding queue wake-up,
preventing the attached qdisc from dequeueing packets and causing the
network link to halt.
As a first step towards resolving this issue, this patch introduces a
failsafe mechanism. It enables the net device watchdog by setting a timeout
value and implements the .ndo_tx_timeout callback.
If a TXQ stalls, the watchdog will trigger the veth_tx_timeout() function,
which logs a warning and calls netif_tx_wake_queue() to unstall the queue
and allow traffic to resume.
The log message will look like this:
veth42: NETDEV WATCHDOG: CPU: 34: transmit queue 0 timed out 5393 ms
veth42: veth backpressure stalled(n:1) TXQ(0) re-enable
This provides a necessary recovery mechanism while the underlying race
condition is investigated further. Subsequent patches will address the root
cause and add more robust state handling in ndo_open/ndo_stop.
Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
Signed-off-by: Jesper Dangaard Brouer <hawk@...nel.org>
---
drivers/net/veth.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index a3046142cb8e..7b1a9805b270 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -959,8 +959,10 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget,
rq->stats.vs.xdp_packets += done;
u64_stats_update_end(&rq->stats.syncp);
- if (peer_txq && unlikely(netif_tx_queue_stopped(peer_txq)))
+ if (peer_txq && unlikely(netif_tx_queue_stopped(peer_txq))) {
+ txq_trans_cond_update(peer_txq);
netif_tx_wake_queue(peer_txq);
+ }
return done;
}
@@ -1373,6 +1375,16 @@ static int veth_set_channels(struct net_device *dev,
goto out;
}
+static void veth_tx_timeout(struct net_device *dev, unsigned int txqueue)
+{
+ struct netdev_queue *txq = netdev_get_tx_queue(dev, txqueue);
+
+ netdev_err(dev, "veth backpressure stalled(n:%ld) TXQ(%u) re-enable\n",
+ atomic_long_read(&txq->trans_timeout), txqueue);
+
+ netif_tx_wake_queue(txq);
+}
+
static int veth_open(struct net_device *dev)
{
struct veth_priv *priv = netdev_priv(dev);
@@ -1711,6 +1723,7 @@ static const struct net_device_ops veth_netdev_ops = {
.ndo_bpf = veth_xdp,
.ndo_xdp_xmit = veth_ndo_xdp_xmit,
.ndo_get_peer_dev = veth_peer_dev,
+ .ndo_tx_timeout = veth_tx_timeout,
};
static const struct xdp_metadata_ops veth_xdp_metadata_ops = {
@@ -1749,6 +1762,7 @@ static void veth_setup(struct net_device *dev)
dev->priv_destructor = veth_dev_free;
dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS;
dev->max_mtu = ETH_MAX_MTU;
+ dev->watchdog_timeo = msecs_to_jiffies(5000);
dev->hw_features = VETH_FEATURES;
dev->hw_enc_features = VETH_FEATURES;
Powered by blists - more mailing lists