linux-kernel - Re: [PATCH net V3 1/2] veth: enable dev

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251106172919.24540443@kernel.org>
Date: Thu, 6 Nov 2025 17:29:19 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Jesper Dangaard Brouer <hawk@...nel.org>
Cc: netdev@...r.kernel.org, Toke Høiland-Jørgensen
 <toke@...e.dk>, Eric Dumazet <eric.dumazet@...il.com>, "David S. Miller"
 <davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>,
 ihor.solodrai@...ux.dev, "Michael S. Tsirkin" <mst@...hat.com>,
 makita.toshiaki@....ntt.co.jp, toshiaki.makita1@...il.com,
 bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-arm-kernel@...ts.infradead.org, kernel-team@...udflare.com
Subject: Re: [PATCH net V3 1/2] veth: enable dev_watchdog for detecting
 stalled TXQs

On Wed, 05 Nov 2025 18:28:12 +0100 Jesper Dangaard Brouer wrote:
> The changes introduced in commit dc82a33297fc ("veth: apply qdisc
> backpressure on full ptr_ring to reduce TX drops") have been found to cause
> a race condition in production environments.
> 
> Under specific circumstances, observed exclusively on ARM64 (aarch64)
> systems with Ampere Altra Max CPUs, a transmit queue (TXQ) can become
> permanently stalled. This happens when the race condition leads to the TXQ
> entering the QUEUE_STATE_DRV_XOFF state without a corresponding queue wake-up,
> preventing the attached qdisc from dequeueing packets and causing the
> network link to halt.
> 
> As a first step towards resolving this issue, this patch introduces a
> failsafe mechanism. It enables the net device watchdog by setting a timeout
> value and implements the .ndo_tx_timeout callback.
> 
> If a TXQ stalls, the watchdog will trigger the veth_tx_timeout() function,
> which logs a warning and calls netif_tx_wake_queue() to unstall the queue
> and allow traffic to resume.
> 
> The log message will look like this:
> 
>  veth42: NETDEV WATCHDOG: CPU: 34: transmit queue 0 timed out 5393 ms
>  veth42: veth backpressure stalled(n:1) TXQ(0) re-enable
> 
> This provides a necessary recovery mechanism while the underlying race
> condition is investigated further. Subsequent patches will address the root
> cause and add more robust state handling.
> 
> Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
> Reviewed-by: Toke Høiland-Jørgensen <toke@...hat.com>
> Signed-off-by: Jesper Dangaard Brouer <hawk@...nel.org>

I think this belongs in net-next.. Fail safe is not really a bug fix.
I'm slightly worried we're missing a corner case and will cause
timeouts to get printed for someone's config.

> +static void veth_tx_timeout(struct net_device *dev, unsigned int txqueue)
> +{
> +	struct netdev_queue *txq = netdev_get_tx_queue(dev, txqueue);
> +
> +	netdev_err(dev, "veth backpressure stalled(n:%ld) TXQ(%u) re-enable\n",
> +		   atomic_long_read(&txq->trans_timeout), txqueue);

If you think the trans_timeout is useful, let's add it to the message
core prints? And then we can make this msg just veth specific, I don't
think we should be repeating what core already printed.

> +	netif_tx_wake_queue(txq);
> +}
> +
>  static int veth_open(struct net_device *dev)
>  {
>  	struct veth_priv *priv = netdev_priv(dev);
> @@ -1711,6 +1723,7 @@ static const struct net_device_ops veth_netdev_ops = {
>  	.ndo_bpf		= veth_xdp,
>  	.ndo_xdp_xmit		= veth_ndo_xdp_xmit,
>  	.ndo_get_peer_dev	= veth_peer_dev,
> +	.ndo_tx_timeout		= veth_tx_timeout,
>  };
>  
>  static const struct xdp_metadata_ops veth_xdp_metadata_ops = {
> @@ -1749,6 +1762,7 @@ static void veth_setup(struct net_device *dev)
>  	dev->priv_destructor = veth_dev_free;
>  	dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS;
>  	dev->max_mtu = ETH_MAX_MTU;
> +	dev->watchdog_timeo = msecs_to_jiffies(5000);
>  
>  	dev->hw_features = VETH_FEATURES;
>  	dev->hw_enc_features = VETH_FEATURES;
> 
>