[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7bcc0f14-0005-4d72-9bc3-a32304499630@intel.com>
Date: Thu, 11 Dec 2025 15:59:07 -0800
From: Tony Nguyen <anthony.l.nguyen@...el.com>
To: Aaron Ma <aaron.ma@...onical.com>, <przemyslaw.kitszel@...el.com>,
<andrew+netdev@...n.ch>, <davem@...emloft.net>, <edumazet@...gle.com>,
<kuba@...nel.org>, <pabeni@...hat.com>, <intel-wired-lan@...ts.osuosl.org>,
<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 2/2] ice: Initialize RDMA after rebuild
On 12/5/2025 12:24 AM, Aaron Ma wrote:
> After wakeup from suspend, IRDMA is initialized with error:
>
> kernel: ice 0000:60:00.0: IRDMA hardware initialization FAILED init_state=4 status=-110
> kernel: ice 0000:60:00.1: IRDMA hardware initialization FAILED init_state=4 status=-110
> kernel: irdma.gen_2 ice.roce.1: probe with driver irdma.gen_2 failed with error -110
> kernel: irdma.gen_2 ice.roce.2: probe with driver irdma.gen_2 failed with error -110
>
> IRDMA times out because the initialization before the schedule reset.
> The ice_init_rdma() function already calls ice_plug_aux_dev() internally,
> ensuring proper initialization order.
>
> Fixes: bc69ad74867db ("ice: avoid IRQ collision to fix init failure on ACPI S3 resume")
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@...el.com>
> Signed-off-by: Aaron Ma <aaron.ma@...onical.com>
> ---
> V1 -> V2: no changes.
>
> drivers/net/ethernet/intel/ice/ice_main.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index 2533876f1a2fd..c6dd04d24ac09 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -5677,11 +5677,6 @@ static int ice_resume(struct device *dev)
> if (ret)
> dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret);
>
> - ret = ice_init_rdma(pf);
> - if (ret)
> - dev_err(dev, "Reinitialize RDMA during resume failed: %d\n",
> - ret);
> -
> clear_bit(ICE_DOWN, pf->state);
> /* Now perform PF reset and rebuild */
> reset_type = ICE_RESET_PFR;
> @@ -7805,7 +7800,12 @@ static void ice_rebuild(struct ice_pf *pf, enum ice_reset_req reset_type)
>
> ice_health_clear(pf);
>
> - ice_plug_aux_dev(pf);
> + /* Initialize RDMA after control queues are ready */
> + err = ice_init_rdma(pf);
ice_init_rdma() allocates a new pf->cdev_info on each call. While it
works for this particular flow, ice_rebuild() is called for all reset
paths so this can cause a memory leak with cdev_info since RDMA is not
de-inited for resets.
Additionally, ice_init_rdma() seems to be well placed in ice_resume() to
mirror the deinit in ice_suspend(). As you mentioned the problem is
caused by plug occurring before a reset. I think the call to
ice_plug_aux_dev() should be removed from ice_init_rdma() to stop this
from happening. With that change the plug won't occur before a reset
and, following reset, plug will be called as part of rebuild when
everything is up and ready. As ice_init_rdma() is also called in one
other location (probe), ice_plug_aux_dev() should be added after the
RDMA init to preserve current flow.
Corresponding changes should be made to the cleanup function as well to
match these changes. i.e. mirror the removal of ice_plug_aux_dev() from
ice_init_rdma() with removing ice_unplug_aux_dev() from
ice_deinit_rdma() and precede the calls of ice_deinit_rdma() with
ice_unplug_aux_dev().
Thanks,
Tony
> + if (err)
> + dev_err(dev, "Reinitialize RDMA after rebuild failed: %d\n",
> + err);
> +
> if (ice_is_feature_supported(pf, ICE_F_SRIOV_LAG))
> ice_lag_rebuild(pf);
>
Powered by blists - more mailing lists