lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 18 Jun 2024 14:21:21 +0000
From: "Loktionov, Aleksandr" <aleksandr.loktionov@...el.com>
To: "Kitszel, Przemyslaw" <przemyslaw.kitszel@...el.com>, "Nguyen, Anthony L"
	<anthony.l.nguyen@...el.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>, "Kang, Kelvin"
	<kelvin.kang@...el.com>, "Kubalewski, Arkadiusz"
	<arkadiusz.kubalewski@...el.com>, "intel-wired-lan@...ts.osuosl.org"
	<intel-wired-lan@...ts.osuosl.org>
Subject: RE: [Intel-wired-lan] [PATCH iwl-net v4] i40e: fix hot issue NVM
 content is corrupted after nvmupdate



> -----Original Message-----
> From: Kitszel, Przemyslaw <przemyslaw.kitszel@...el.com>
> Sent: Tuesday, June 18, 2024 3:50 PM
> To: Loktionov, Aleksandr <aleksandr.loktionov@...el.com>; Nguyen,
> Anthony L <anthony.l.nguyen@...el.com>
> Cc: netdev@...r.kernel.org; Kang, Kelvin <kelvin.kang@...el.com>;
> Kubalewski, Arkadiusz <arkadiusz.kubalewski@...el.com>; intel-wired-
> lan@...ts.osuosl.org
> Subject: Re: [Intel-wired-lan] [PATCH iwl-net v4] i40e: fix hot issue
> NVM content is corrupted after nvmupdate
> 
> On 6/18/24 15:21, Aleksandr Loktionov wrote:
> > The bug affects users only at the time when they try to update NVM,
> > and only F/W versions that generate errors while nvmupdate. For
> > example, X710DA2 with 0x8000ECB7 F/W is affected, but there are
> probably more...
> >
> > Command for reproduction is just NVM update:
> >   ./nvmupdate64
> 
> nvmupdate64 is not an upstream tool, but it's fine to mention it here,
> as we don't have a better alternative for i40e as of now
> 
> >
> > In the log instead of:
> >   i40e_nvmupd_exec_aq err I40E_ERR_ADMIN_QUEUE_ERROR aq_err
> > I40E_AQ_RC_ENOMEM)
> > appears:
> >   i40e_nvmupd_exec_aq err -EIO aq_err I40E_AQ_RC_ENOMEM
> >
> > But the problematic code did silently convert EIO into EAGAIN which
> > forced nvmupdate to ignore EAGAIN error and retry the same operation
> until timeout.
> > That's why NVM update takes 20+ minutes to finish with the fail in
> the end.
> 
> this paragraph tells more about nvmupdate tool problems that the
> driver
What is your proposal for it exactly?

> >
> > After commit 230f3d53a547 ("i40e: remove i40e_status"), which should
> > only replace F/W specific error codes with Linux kernel generic,
> 
> > all EIO errors
> > suddenly started to be converted into EAGAIN
> 
> all or just this one that you are fixing here?
All error codes from FW which exactly I'm fixing of course.

> 
> > which leads nvmupdate to retry
> > until it timeouts and sometimes fails after more than 20 minutes in
> > the middle of NVM update, so NVM becomes corrupted.
> >
> > Remove wrong EIO to EGAIN conversion and pass all errors as is.
> >
> > Fixes: 230f3d53a547 ("i40e: remove i40e_status")
> 
> That is a proper tag, and the description makes it clear that we want
> the patch finally applied, thank you for the efforts to make it well
> described.
> 
> > Co-developed-by: Kelvin Kang <kelvin.kang@...el.com>
> > Signed-off-by: Kelvin Kang <kelvin.kang@...el.com>
> 
> this 2 line removal was indeed developed by two of you?
As written above, yes of course.

> > Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@...el.com>
> > Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@...el.com>
> > ---
> > v3->v4 commit message update
> 
> Please fix the subject line too, it's what will be the most frequently
> read part of your work, and with stay in git for decades. You could
> use:
> i40e: fix wrong error code replacement
> 
This patch fixes urgent/hot issue with NVM update, which is stated in the
title. Wrong error conversion code could lead to different severity issues.
From my point it's better to mention in the title the serious problem which
users meet/see, am I wrong?

> and add link in changelog section:
> v4:
> https://lore.kernel.org/netdev/20240618132111.3193963-1-
> aleksandr.loktionov@...el.com/T/#u
> 
> > v2->v3 commit messege typos
> > v1->v2 commit message update
> > ---
> >   drivers/net/ethernet/intel/i40e/i40e_adminq.h | 4 ----
> >   1 file changed, 4 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> > b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> > index ee86d2c..55b5bb8 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> > @@ -109,10 +109,6 @@ static inline int i40e_aq_rc_to_posix(int
> aq_ret, int aq_rc)
> >   		-EFBIG,      /* I40E_AQ_RC_EFBIG */
> >   	};
> >
> > -	/* aq_rc is invalid if AQ timed out */
> > -	if (aq_ret == -EIO)
> > -		return -EAGAIN;
> > -
> >   	if (!((u32)aq_rc < (sizeof(aq_to_posix) /
> sizeof((aq_to_posix)[0]))))
> >   		return -ERANGE;
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ