[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SJ0PR11MB5866360BDA97A03298A4A637E5C62@SJ0PR11MB5866.namprd11.prod.outlook.com>
Date: Mon, 10 Jun 2024 10:14:20 +0000
From: "Loktionov, Aleksandr" <aleksandr.loktionov@...el.com>
To: Paul Menzel <pmenzel@...gen.mpg.de>, "Kang, Kelvin"
<kelvin.kang@...el.com>
CC: "intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
"Nguyen, Anthony L" <anthony.l.nguyen@...el.com>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>, "Kubalewski, Arkadiusz"
<arkadiusz.kubalewski@...el.com>, "Sokolowski, Jan"
<jan.sokolowski@...el.com>, Leon Romanovsky <leonro@...dia.com>
Subject: RE: [Intel-wired-lan] [PATCH iwl-net] i40e: fix hot issue NVM content
is corrupted after nvmupdate
> -----Original Message-----
> From: Paul Menzel <pmenzel@...gen.mpg.de>
> Sent: Monday, June 10, 2024 11:45 AM
> To: Loktionov, Aleksandr <aleksandr.loktionov@...el.com>; Kang,
> Kelvin <kelvin.kang@...el.com>
> Cc: intel-wired-lan@...ts.osuosl.org; Nguyen, Anthony L
> <anthony.l.nguyen@...el.com>; netdev@...r.kernel.org; Kubalewski,
> Arkadiusz <arkadiusz.kubalewski@...el.com>; Sokolowski, Jan
> <jan.sokolowski@...el.com>; Leon Romanovsky <leonro@...dia.com>
> Subject: Re: [Intel-wired-lan] [PATCH iwl-net] i40e: fix hot issue
> NVM content is corrupted after nvmupdate
>
> Dear Aleksandr, dear Kelvin,
>
>
> Thank you for your patch.
>
>
> Am 10.06.24 um 11:20 schrieb Aleksandr Loktionov:
> > After 230f3d53a547 patch all I/O errors are being converted into
> > EAGAIN which leads to retries until timeout so nvmupdate
> sometimes
> > fails after more than 20 minutes!
> >
> > Remove misleading EIO to EGAIN conversion and pass all errors as
> is.
> >
> > Fixes: 230f3d53a547 ("i40e: remove i40e_status")
>
> This commit is present since v6.6-rc1, released September last year
> (2023). So until now, nobody noticed this?
>
Really, really. The regression affects users only when they update F/W,
and not all F/W are affected, only that generate I/O errors while update.
Not cars are affected, but the consequences are serous as in subj.
> > Co-developed-by: Kelvin Kang <kelvin.kang@...el.com>
> > Signed-off-by: Kelvin Kang <kelvin.kang@...el.com>
> > Reviewed-by: Arkadiusz Kubalewski
> <arkadiusz.kubalewski@...el.com>
> > Signed-off-by: Aleksandr Loktionov
> <aleksandr.loktionov@...el.com>
>
> Please give more details about your test setup. For me it’s also
> not clear, how the NVM content gets corrupted as stated in the
> summary/title. Could you please elaborate that in the commit
> message.
>
> > ---
> > drivers/net/ethernet/intel/i40e/i40e_adminq.h | 4 ----
> > 1 file changed, 4 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> > b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> > index ee86d2c..55b5bb8 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> > @@ -109,10 +109,6 @@ static inline int i40e_aq_rc_to_posix(int
> aq_ret, int aq_rc)
> > -EFBIG, /* I40E_AQ_RC_EFBIG */
> > };
> >
> > - /* aq_rc is invalid if AQ timed out */
> > - if (aq_ret == -EIO)
> > - return -EAGAIN;
> > -
> > if (!((u32)aq_rc < (sizeof(aq_to_posix) /
> sizeof((aq_to_posix)[0]))))
> > return -ERANGE;
>
> The referenced commit 230f3d53a547 does:
>
> ```
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> index ee394aacef4d..267f2e0a21ce 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> @@ -5,7 +5,6 @@
> #define _I40E_ADMINQ_H_
>
> #include "i40e_osdep.h"
> -#include "i40e_status.h"
> #include "i40e_adminq_cmd.h"
>
> #define I40E_ADMINQ_DESC(R, i) \
> @@ -117,7 +116,7 @@ static inline int i40e_aq_rc_to_posix(int
> aq_ret, int aq_rc)
> };
>
> /* aq_rc is invalid if AQ timed out */
> - if (aq_ret == I40E_ERR_ADMIN_QUEUE_TIMEOUT)
> + if (aq_ret == -EIO)
> return -EAGAIN;
>
> if (!((u32)aq_rc < (sizeof(aq_to_posix) /
> sizeof((aq_to_posix)[0]))))
> ```
>
> So I do not see yet, why removing the whole hunk is the solution.
>
>
> Kind regards,
>
> Paul
Powered by blists - more mailing lists