linux-kernel - Re: [PATCH] pci/doe: add a 1 second retry window to pci

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zvwdmy43AUELFzV4@PC2K9PVX.TheFacebook.com>
Date: Tue, 1 Oct 2024 12:04:43 -0400
From: Gregory Price <gourry@...rry.net>
To: Lukas Wunner <lukas@...ner.de>
Cc: Jonathan Cameron <Jonathan.Cameron@...wei.com>,
	Dan Williams <dan.j.williams@...el.com>, linux-pci@...r.kernel.org,
	linux-cxl@...r.kernel.org, linux-kernel@...r.kernel.org,
	bhelgaas@...gle.com, dave@...olabs.net, dave.jiang@...el.com,
	vishal.l.verma@...el.com
Subject: Re: [PATCH] pci/doe: add a 1 second retry window to pci_doe

On Tue, Oct 01, 2024 at 05:47:03PM +0200, Lukas Wunner wrote:
> On Tue, Oct 01, 2024 at 11:13:17AM -0400, Gregory Price wrote:
> > > > Gregory Price wrote:
> > > > > Depending on the device, sometimes firmware clears the busy flag
> > > > > later than expected.  This can cause the device to appear busy when
> > > > > calling multiple commands in quick sucession. Add a 1 second retry
> > > > > window to all doe commands that end with -EBUSY.  
> > 
> > Just following up here, it sounds like everyone is unsure of this change.
> > 
> > I can confirm that this handles the CDAT retry issue I am seeing, and that
> > the BUSY bit is set upon entry into the initial call. Only 1 or 2 retries
> > are attempted before it is cleared and returns successfully.
> > 
> > I'd explored putting the retry logic in the CDAT code that calls into here,
> > but that just seemed wrong.  Is there a suggestion or a nak here?
> > 
> > Trying to find a path forward.
> 
> The PCIe Base Spec doesn't prescribe a maximum timeout for the
> DOE BUSY bit to clear.  Thus it seems fine to me in principle
> to add a (or raise the) timeout if it turns out to be necessary
> for real-life hardware.
> 
> That said, the proposed patch has room for improvement:

Will address and resubmit, thanks!

> 
> * The patch seems to wait for DOE BUSY bit to clear *after*
>   completion.  That's odd.  The kernel waits for DOE Busy bit
>   to clear *before* sending a new request, in pci_doe_send_req().
>   My expectation would have been that you'd add a loop there which
>   polls for DOE Busy bit to clear before sending a request.
> 
>   It seems that polling is the only option as no interrupt is
>   raised on DOE Busy bit clear, per PCIe r6.2 sec 6.30.3.
>   (Please add this bit of information to the commit message.)
> 
> * The commit message should clearly specify the device(s)
>   affected by the issue (Vendor and Device ID plus name).
>   Comments such as "Depending on the device, sometimes ..."
>   are a little too vague.
> 
> * The "1 or 2 retries" bit of information you're mentioning
>   above should likewise be in the commit message.
> 
> * Please use "PCI/DOE:" as subject prefix to match previous
>   commits which touched drivers/pci/doe.c.
> 
> * Please adhere to spec language, e.g. use "DOE Busy bit"
>   instead of "busy bit" so it's unambiguous for readers
>   what you're referring to.
> 
> Thanks,
> 
> Lukas