[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251126200541.00e5270f@kernel.org>
Date: Wed, 26 Nov 2025 20:05:41 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Dipayaan Roy <dipayanroy@...ux.microsoft.com>
Cc: kys@...rosoft.com, haiyangz@...rosoft.com, wei.liu@...nel.org,
decui@...rosoft.com, andrew+netdev@...n.ch, davem@...emloft.net,
edumazet@...gle.com, pabeni@...hat.com, longli@...rosoft.com,
kotaranov@...rosoft.com, horms@...nel.org,
shradhagupta@...ux.microsoft.com, ssengar@...ux.microsoft.com,
ernis@...ux.microsoft.com, shirazsaleem@...rosoft.com,
linux-hyperv@...r.kernel.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-rdma@...r.kernel.org,
dipayanroy@...rosoft.com
Subject: Re: [PATCH net-next, v4] net: mana: Implement ndo_tx_timeout and
serialize queue resets per port.
On Sun, 23 Nov 2025 10:08:18 -0800 Dipayaan Roy wrote:
> Implement .ndo_tx_timeout for MANA so any stalled TX queue can be detected
> and a device-controlled port reset for all queues can be scheduled to a
> ordered workqueue. The reset for all queues on stall detection is
> recomended by hardware team.
>
> The change introduces a single ordered workqueue
> "mana_per_port_queue_reset_wq" queuing one work_struct per port,
> using WQ_UNBOUND | WQ_MEM_RECLAIM so stalled queue reset work can
> run on any CPU and still make forward progress under memory
> pressure.
And we need to be able to reset the NIC queue under memory pressure
because.. ? I could be wrong but I still find this unusual / defensive
programming, if you could point me at some existing drivers that'd help.
> @@ -3287,6 +3341,7 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
> ndev->min_mtu = ETH_MIN_MTU;
> ndev->needed_headroom = MANA_HEADROOM;
> ndev->dev_port = port_idx;
> + ndev->watchdog_timeo = 15 * HZ;
5 sec is typical, off the top of my head
> @@ -3647,6 +3717,11 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
> free_netdev(ndev);
> }
>
> + if (ac->per_port_queue_reset_wq) {
> + destroy_workqueue(ac->per_port_queue_reset_wq);
> + ac->per_port_queue_reset_wq = NULL;
> + }
I think you're missing this cleanup in the failure path of mana_probe
--
pw-bot: cr
Powered by blists - more mailing lists