[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <193b55c7998.d66e7e7c1942154.6474606603462169748@collabora.com>
Date: Wed, 11 Dec 2024 10:55:56 +0000
From: Robert Beckett <bob.beckett@...labora.com>
To: "Keith Busch" <kbusch@...nel.org>
Cc: "Pawel Anikiel" <panikiel@...gle.com>, "axboe" <axboe@...nel.dk>,
	"hch" <hch@....de>, "kernel" <kernel@...labora.com>,
	"linux-kernel" <linux-kernel@...r.kernel.org>,
	"linux-nvme" <linux-nvme@...ts.infradead.org>,
	"sagi" <sagi@...mberg.me>
Subject: Re: [PATCH] nvme-pci: 512 byte aligned dma pool segment quirk
 ---- On Tue, 10 Dec 2024 21:36:55 +0000  Keith Busch  wrote --- 
 > On Mon, Dec 09, 2024 at 04:33:01PM +0100, Paweł Anikiel wrote:
 > > On Mon, Dec 9, 2024 at 1:33 PM Robert Beckett bob.beckett@...labora.com> wrote:
 > > > [...]
 > > > I have no further updates on this. I have received no further info from the vendor.
 > > > I think we can go ahead and use the alignment patch as is. The only outstanding question was whether it is an
 > > > implicit last entry per page chain vs simple alisngment requirement. Either way, using the dmapool
 > > > alignment fixes both of these potential causes, so we should just take it as is.
 > > > If we ever get any better info and can do a more specific patch in future, we can rework it then.
 > > 
 > > I think the 512 byte alignment fix is good. I tried coming up with
 > > something more specific, but everything I could think of was either
 > > too complicated or artificial.
 > > 
 > > Regarding the question of whether this is an alignment requirement or
 > > the last PRP entry issue, I strongly believe it's the latter. I have a
 > > piece of code that clearly demonstrates the hardware bug when run on a
 > > device with the nvme bridge. I would really appreciate it if this was
 > > verified and my explanation was included in the patch.
 > 
 > I've pushed this to nvme-6.13 with an updated commit message here:
 > 
 >   https://git.infradead.org/?p=nvme.git;a=commitdiff;h=ccd84b8d6f4a60626dacb933b5d56dadca75c0ca
lgtm. Thanks!
 > 
 > I can force an update if you have any edit suggestions.
 > 
 > Commit message copied below:
 > 
 > Author: Robert Beckett bob.beckett@...labora.com>
 > 
 > nvme-pci: 512 byte aligned dma pool segment quirk
 > 
 > We initially introduced a quick fix limiting the queue depth to 1 as
 > experimentation showed that it fixed data corruption on 64GB steamdecks.
 > 
 > Further experimentation revealed corruption only happens when the last
 > PRP data element aligns to the end of the page boundary. The device
 > appears to treat this as a PRP chain to a new list instead of the data
 > element that it actually is. This is an implementation is in violation
 > of the spec. Encountering this errata with the Linux driver requires the
 > host request a 128k transfer and coincidently get the last small pool
 > dma buffer within a page.
 > 
 > The QD1 quirk effectly works around this because the last data PRP
 > always was at a 248 byte offset from the page start, so it never
 > appeared at the end of the page. Further, the MDTS is also small enough
 > that the "large" prp pool can hold enough PRP elements to never get to
 > the end, so that pool is not a problem either.
 > 
 > Introduce a new quirk to ensure the small pool is always aligned such
 > that the last PRP element can't appear a the end of the page. This comes
 > at the expense of wasting 256 bytes per small pool page allocated.
 > 
 > Link: https://lore.kernel.org/linux-nvme/20241113043151.GA20077@lst.de/T/#u
 > Fixes: 83bdfcbdbe5d ("nvme-pci: qdepth 1 quirk")
 > Cc: Paweł Anikiel panikiel@...gle.com>
 > Signed-off-by: Robert Beckett bob.beckett@...labora.com>
 > Signed-off-by: Keith Busch kbusch@...nel.org>
 > 
Powered by blists - more mailing lists
 
