lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 11 Jun 2024 08:58:05 +0000
From: "Loktionov, Aleksandr" <aleksandr.loktionov@...el.com>
To: Paul Menzel <pmenzel@...gen.mpg.de>, "Kang, Kelvin"
	<kelvin.kang@...el.com>
CC: "Sokolowski, Jan" <jan.sokolowski@...el.com>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, "Kubalewski, Arkadiusz"
	<arkadiusz.kubalewski@...el.com>, "Nguyen, Anthony L"
	<anthony.l.nguyen@...el.com>, "intel-wired-lan@...ts.osuosl.org"
	<intel-wired-lan@...ts.osuosl.org>, Leon Romanovsky <leonro@...dia.com>
Subject: RE: [Intel-wired-lan] [PATCH iwl-net] i40e: fix hot issue NVM content
 is corrupted after nvmupdate



> -----Original Message-----
> From: Paul Menzel <pmenzel@...gen.mpg.de>
> Sent: Monday, June 10, 2024 2:26 PM
> To: Loktionov, Aleksandr <aleksandr.loktionov@...el.com>; Kang, Kelvin
> <kelvin.kang@...el.com>
> Cc: Sokolowski, Jan <jan.sokolowski@...el.com>; netdev@...r.kernel.org;
> Kubalewski, Arkadiusz <arkadiusz.kubalewski@...el.com>; Nguyen, Anthony L
> <anthony.l.nguyen@...el.com>; intel-wired-lan@...ts.osuosl.org; Leon
> Romanovsky <leonro@...dia.com>
> Subject: Re: [Intel-wired-lan] [PATCH iwl-net] i40e: fix hot issue NVM
> content is corrupted after nvmupdate
> 
> Dear Aleksandr,
> 
> 
> Thank you for your response.
> 
> 
> Am 10.06.24 um 12:20 schrieb Loktionov, Aleksandr:
> >
> >
> >> -----Original Message-----
> >> From: Intel-wired-lan <intel-wired-lan-bounces@...osl.org> On Behalf
> >> Of Loktionov, Aleksandr
> >> Sent: Monday, June 10, 2024 12:16 PM
> 
> >>> -----Original Message-----
> >>> From: Loktionov, Aleksandr
> >>> Sent: Monday, June 10, 2024 12:14 PM
> 
> >>>> -----Original Message-----
> >>>> From: Paul Menzel <pmenzel@...gen.mpg.de>
> >>>> Sent: Monday, June 10, 2024 11:45 AM
> 
> >>>> Am 10.06.24 um 11:20 schrieb Aleksandr Loktionov:
> >>>>> After 230f3d53a547 patch all I/O errors are being converted into
> >>>>> EAGAIN which leads to retries until timeout so nvmupdate sometimes
> >>>>> fails after more than 20 minutes!
> >>>>>
> >>>>> Remove misleading EIO to EGAIN conversion and pass all errors as
> >>>> is.
> >>>>>
> >>>>> Fixes: 230f3d53a547 ("i40e: remove i40e_status")
> >>>>
> >>>> This commit is present since v6.6-rc1, released September last year
> >>>> (2023). So until now, nobody noticed this?
> >>>>
> >>> Really, really. The regression affects users only when they update
> >>> F/W, and not all F/W are affected, only that generate I/O errors
> >>> while update.
> >> Not all the cards are affected, but the consequences are serous as in
> >> subj.
> >>
> >>>>> Co-developed-by: Kelvin Kang <kelvin.kang@...el.com>
> >>>>> Signed-off-by: Kelvin Kang <kelvin.kang@...el.com>
> >>>>> Reviewed-by: Arkadiusz Kubalewski
> >>>> <arkadiusz.kubalewski@...el.com>
> >>>>> Signed-off-by: Aleksandr Loktionov
> >>>> <aleksandr.loktionov@...el.com>
> >>>>
> >>>> Please give more details about your test setup. For me it’s also
> >>>> not clear, how the NVM content gets corrupted as stated in the
> >>>> summary/title. Could you please elaborate that in the commit message.
> 
> > For example X710DA2 with 0x8000ECB7 is affected, but there are
> > probably more...
> 
> Please amend the commit message with this information, and for ease also the
> commands you executed.
> 
> > The corruption is already described - because of timeout nvmupdate
> > timeouts failing to update NVM.
> 
> Only because something times out, does not mean it causes corruption.
> 
> Please amend the commit message. It looks like you also missed more of my
> questions below.
> 
> >>>>> ---
> >>>>>    drivers/net/ethernet/intel/i40e/i40e_adminq.h | 4 ----
> >>>>>    1 file changed, 4 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> >>>>> b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> >>>>> index ee86d2c..55b5bb8 100644
> >>>>> --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> >>>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> >>>>> @@ -109,10 +109,6 @@ static inline int i40e_aq_rc_to_posix(int aq_ret,
> int aq_rc)
> >>>>>    		-EFBIG,      /* I40E_AQ_RC_EFBIG */
> >>>>>    	};
> >>>>>
> >>>>> -	/* aq_rc is invalid if AQ timed out */
> >>>>> -	if (aq_ret == -EIO)
> >>>>> -		return -EAGAIN;
> >>>>> -
> >>>>>    	if (!((u32)aq_rc < (sizeof(aq_to_posix) /
> sizeof((aq_to_posix)[0]))))
> >>>>>    		return -ERANGE;
> >>>>
> >>>> The referenced commit 230f3d53a547 does:
> >>>>
> >>>> ```
> >>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> >>>> b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> >>>> index ee394aacef4d..267f2e0a21ce 100644
> >>>> --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> >>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> >>>> @@ -5,7 +5,6 @@
> >>>>    #define _I40E_ADMINQ_H_
> >>>>
> >>>>    #include "i40e_osdep.h"
> >>>> -  #include "i40e_status.h"
> >>>>    #include "i40e_adminq_cmd.h"
> >>>>
> >>>>    #define I40E_ADMINQ_DESC(R, i)   \
> >>>> @@ -117,7 +116,7 @@ static inline int i40e_aq_rc_to_posix(int aq_ret,
> int aq_rc)
> >>>>           };
> >>>>
> >>>>           /* aq_rc is invalid if AQ timed out */
> >>>> -       if (aq_ret == I40E_ERR_ADMIN_QUEUE_TIMEOUT)
> >>>> +       if (aq_ret == -EIO)
> >>>>                   return -EAGAIN;
> >>>>
> >>>>           if (!((u32)aq_rc < (sizeof(aq_to_posix) /
> >>>> sizeof((aq_to_posix)[0])))) ```
> >>>>
> >>>> So I do not see yet, why removing the whole hunk is the solution.
The solution is to pass errors as mentioned in the commit as is from f/w to nvmupdate.

> Kind regards,
> 
> Paul
> 
> 
> PS: Would it be possible, that you use another email program. The cited
> parts are wrapped awkwardly, and it takes some time to correct it in my
> response.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ