[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260126204442.4ae27de4@kernel.org>
Date: Mon, 26 Jan 2026 20:44:42 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
Cc: netdev@...r.kernel.org, Andrew Lunn <andrew+netdev@...n.ch>, "David S.
Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Paolo
Abeni <pabeni@...hat.com>, Wen Gu <guwen@...ux.alibaba.com>, Philo Lu
<lulie@...ux.alibaba.com>, Lorenzo Bianconi <lorenzo@...nel.org>, Vadim
Fedorenko <vadim.fedorenko@...ux.dev>, Dong Yibo <dong100@...se.com>,
Heiner Kallweit <hkallweit1@...il.com>, Lukas Bulwahn
<lukas.bulwahn@...hat.com>, Dust Li <dust.li@...ux.alibaba.com>
Subject: Re: [PATCH net-next v22 4/6] eea: create/destroy rx,tx queues for
netdevice open and stop
On Thu, 22 Jan 2026 20:05:06 +0800 Xuan Zhuo wrote:
> +int eea_reset_hw_resources(struct eea_net *enet, struct eea_net_init_ctx *ctx)
> +{
> + int err;
> +
> + if (!netif_running(enet->netdev)) {
> + enet->cfg = ctx->cfg;
> + return 0;
> + }
> +
> + err = eea_alloc_rxtx_q_mem(ctx);
> + if (err) {
> + netdev_warn(enet->netdev,
> + "eea reset: alloc q failed. stop reset. err %d\n",
> + err);
> + return err;
> + }
> +
> + eea_netdev_stop(enet->netdev);
[1]
> + enet_bind_new_q_and_cfg(enet, ctx);
> +
> + err = eea_active_ring_and_irq(enet);
> + if (err) {
> + /* Although the notification to hardware or the initial IRQ
> + * setup has failed (which is, of course, a very low-probability
> + * event), we do not immediately free the queues resources here.
> + * Instead, we defer their release until the normal NIC cleanup,
> + * or until the user or hardware triggers a reset operation.
> + * Because that the dev is running.
> + */
> + netdev_err(enet->netdev,
> + "eea reset: active new ring and irq failed. err %d\n",
> + err);
> + return err;
> + }
> +
> + err = eea_start_rxtx(enet->netdev);
> + if (err)
> + netdev_err(enet->netdev,
> + "eea reset: start queue failed. err %d\n", err);
This looks questionable, eea_start_rxtx() can only fail if
netif_set_real_num_queues() failed, which in turn may fail
for sysfs-related reasons. So we're not talking about "rare
HW fail" scenarios in this case. Memory pressure or name collision
would be enough. Seems like this is easy to address.
Move netif_set_real_num_queues() out of eea_start_rxtx().
This way eea_start_rxtx() can start returning void (can't fail).
In this function you can move the netif_set_real_num_queues()
call where I placed the [1] marker, before any changes were
actually made to the adapter.
Looking closer eea_active_ring_and_irq() is also calling kernel
registration functions which may fail, like request_irq().
You should do all of that before you actually shut down the adapter.
> + return err;
> +}
--
pw-bot: cr
Powered by blists - more mailing lists