lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SJ0PR11MB58660E94611961DC52634069E5C62@SJ0PR11MB5866.namprd11.prod.outlook.com>
Date: Mon, 10 Jun 2024 10:20:07 +0000
From: "Loktionov, Aleksandr" <aleksandr.loktionov@...el.com>
To: "Loktionov, Aleksandr" <aleksandr.loktionov@...el.com>, Paul Menzel
	<pmenzel@...gen.mpg.de>, "Kang, Kelvin" <kelvin.kang@...el.com>
CC: "Sokolowski, Jan" <jan.sokolowski@...el.com>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, "Kubalewski, Arkadiusz"
	<arkadiusz.kubalewski@...el.com>, "Nguyen, Anthony L"
	<anthony.l.nguyen@...el.com>, "intel-wired-lan@...ts.osuosl.org"
	<intel-wired-lan@...ts.osuosl.org>, Leon Romanovsky <leonro@...dia.com>
Subject: RE: [Intel-wired-lan] [PATCH iwl-net] i40e: fix hot issue NVM content
 is corrupted after nvmupdate



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@...osl.org> On
> Behalf Of Loktionov, Aleksandr
> Sent: Monday, June 10, 2024 12:16 PM
> To: Paul Menzel <pmenzel@...gen.mpg.de>; Kang, Kelvin
> <kelvin.kang@...el.com>
> Cc: Sokolowski, Jan <jan.sokolowski@...el.com>;
> netdev@...r.kernel.org; Kubalewski, Arkadiusz
> <arkadiusz.kubalewski@...el.com>; Nguyen, Anthony L
> <anthony.l.nguyen@...el.com>; intel-wired-lan@...ts.osuosl.org;
> Leon Romanovsky <leonro@...dia.com>
> Subject: Re: [Intel-wired-lan] [PATCH iwl-net] i40e: fix hot issue
> NVM content is corrupted after nvmupdate
> 
> 
> 
> > -----Original Message-----
> > From: Loktionov, Aleksandr
> > Sent: Monday, June 10, 2024 12:14 PM
> > To: Paul Menzel <pmenzel@...gen.mpg.de>; Kang, Kelvin
> > <kelvin.kang@...el.com>
> > Cc: intel-wired-lan@...ts.osuosl.org; Nguyen, Anthony L
> > <anthony.l.nguyen@...el.com>; netdev@...r.kernel.org; Kubalewski,
> > Arkadiusz <Arkadiusz.Kubalewski@...el.com>; Sokolowski, Jan
> > <jan.sokolowski@...el.com>; Leon Romanovsky <leonro@...dia.com>
> > Subject: RE: [Intel-wired-lan] [PATCH iwl-net] i40e: fix hot
> issue NVM
> > content is corrupted after nvmupdate
> >
> >
> >
> > > -----Original Message-----
> > > From: Paul Menzel <pmenzel@...gen.mpg.de>
> > > Sent: Monday, June 10, 2024 11:45 AM
> > > To: Loktionov, Aleksandr <aleksandr.loktionov@...el.com>; Kang,
> > Kelvin
> > > <kelvin.kang@...el.com>
> > > Cc: intel-wired-lan@...ts.osuosl.org; Nguyen, Anthony L
> > > <anthony.l.nguyen@...el.com>; netdev@...r.kernel.org;
> Kubalewski,
> > > Arkadiusz <arkadiusz.kubalewski@...el.com>; Sokolowski, Jan
> > > <jan.sokolowski@...el.com>; Leon Romanovsky <leonro@...dia.com>
> > > Subject: Re: [Intel-wired-lan] [PATCH iwl-net] i40e: fix hot
> > issue NVM
> > > content is corrupted after nvmupdate
> > >
> > > Dear Aleksandr, dear Kelvin,
> > >
> > >
> > > Thank you for your patch.
> > >
> > >
> > > Am 10.06.24 um 11:20 schrieb Aleksandr Loktionov:
> > > > After 230f3d53a547 patch all I/O errors are being converted
> > into
> > > > EAGAIN which leads to retries until timeout so nvmupdate
> > > sometimes
> > > > fails after more than 20 minutes!
> > > >
> > > > Remove misleading EIO to EGAIN conversion and pass all errors
> > as
> > > is.
> > > >
> > > > Fixes: 230f3d53a547 ("i40e: remove i40e_status")
> > >
> > > This commit is present since v6.6-rc1, released September last
> > year
> > > (2023). So until now, nobody noticed this?
> > >
> > Really, really. The regression affects users only when they
> update
> > F/W, and not all F/W are affected, only that generate I/O errors
> while
> > update.
> Not all the cards are affected, but the consequences are serous as
> in subj.
> 
> >
> > > > Co-developed-by: Kelvin Kang <kelvin.kang@...el.com>
> > > > Signed-off-by: Kelvin Kang <kelvin.kang@...el.com>
> > > > Reviewed-by: Arkadiusz Kubalewski
> > > <arkadiusz.kubalewski@...el.com>
> > > > Signed-off-by: Aleksandr Loktionov
> > > <aleksandr.loktionov@...el.com>
> > >
> > > Please give more details about your test setup. For me it’s
> also
> > not
> > > clear, how the NVM content gets corrupted as stated in the
> > > summary/title. Could you please elaborate that in the commit
> > message.
For example X710DA2 with 0x8000ECB7 is affected, but there are probably more...
The corruption is already described - because of timeout nvmupdate timeouts failing to update NVM. 

> > >
> > > > ---
> > > >   drivers/net/ethernet/intel/i40e/i40e_adminq.h | 4 ----
> > > >   1 file changed, 4 deletions(-)
> > > >
> > > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> > > > b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> > > > index ee86d2c..55b5bb8 100644
> > > > --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> > > > +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> > > > @@ -109,10 +109,6 @@ static inline int
> i40e_aq_rc_to_posix(int
> > > aq_ret, int aq_rc)
> > > >   		-EFBIG,      /* I40E_AQ_RC_EFBIG */
> > > >   	};
> > > >
> > > > -	/* aq_rc is invalid if AQ timed out */
> > > > -	if (aq_ret == -EIO)
> > > > -		return -EAGAIN;
> > > > -
> > > >   	if (!((u32)aq_rc < (sizeof(aq_to_posix) /
> > > sizeof((aq_to_posix)[0]))))
> > > >   		return -ERANGE;
> > >
> > > The referenced commit 230f3d53a547 does:
> > >
> > > ```
> > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> > > b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> > > index ee394aacef4d..267f2e0a21ce 100644
> > > --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> > > +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> > > @@ -5,7 +5,6 @@
> > >   #define _I40E_ADMINQ_H_
> > >
> > >   #include "i40e_osdep.h"
> > > -#include "i40e_status.h"
> > >   #include "i40e_adminq_cmd.h"
> > >
> > >   #define I40E_ADMINQ_DESC(R, i)   \
> > > @@ -117,7 +116,7 @@ static inline int i40e_aq_rc_to_posix(int
> > aq_ret,
> > > int aq_rc)
> > >          };
> > >
> > >          /* aq_rc is invalid if AQ timed out */
> > > -       if (aq_ret == I40E_ERR_ADMIN_QUEUE_TIMEOUT)
> > > +       if (aq_ret == -EIO)
> > >                  return -EAGAIN;
> > >
> > >          if (!((u32)aq_rc < (sizeof(aq_to_posix) /
> > > sizeof((aq_to_posix)[0]))))
> > > ```
> > >
> > > So I do not see yet, why removing the whole hunk is the
> solution.
> > >
> > >
> > > Kind regards,
> > >
> > > Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ