lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YCQT90mK1kacZ7ZA@rocinante>
Date:   Wed, 10 Feb 2021 18:12:36 +0100
From:   Krzysztof WilczyƄski <kw@...ux.com>
To:     Qiuxu Zhuo <qiuxu.zhuo@...el.com>
Cc:     Bjorn Helgaas <bhelgaas@...gle.com>,
        Sean V Kelley <sean.v.kelley@...el.com>,
        "Luck, Tony" <tony.luck@...el.com>, "Jin, Wen" <wen.jin@...el.com>,
        linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/1] PCI/RCEC: Fix failure to inject errors to some RCiEP
 devices

Hi Qiuxu,

Nice catch!  Thank you for sending the fix over!

[...]
> On a Sapphire Rapids server, it failed to inject correctable errors
> to the RCiEP device e8:02.0 which was associated with the RCEC device
> e8:00.4. See the following error log before applying the patch:
> 
> aer-inject -s e8:02.0 examples/correctable
> Error: Failed to write, No such device
> 
> This was because rcec_assoc_rciep() mistakenly used "rciep->devfn" as
> device number to check whether the corresponding bit was set in
> the RCiEPBitmap of the RCEC. So that the RCiEP device e8:02.0 wasn't
> linked to the RCEC and resulted in the above error.
> 
> Fix it by using PCI_SLOT() to convert rciep->devfn to device number.
> Ensure that the RCiEP devices associated with the RCEC are linked to
> the RCEC as the RCEC is enumerated. After applying the patch, correctable
> errors can be injected to the RCiEP successfully.

Would this only affect error injection or would this be also a generic
problem with the driver itself causing issues regardless of whether it
was an error injection or not for this particular device?  I am asking,
as there is a lot going on in the commit message.

I wonder if simplifying this commit message so that it clearly explains
what was broken, why, and how this patch is fixing it, would perhaps be
an option?  The backstory of how you found the issue while doing some
testing and error injection is nice, but not sure if needed.

What do you think?

Krzysztof

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ