[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <193b55c7998.d66e7e7c1942154.6474606603462169748@collabora.com>
Date: Wed, 11 Dec 2024 10:55:56 +0000
From: Robert Beckett <bob.beckett@...labora.com>
To: "Keith Busch" <kbusch@...nel.org>
Cc: "Pawel Anikiel" <panikiel@...gle.com>, "axboe" <axboe@...nel.dk>,
"hch" <hch@....de>, "kernel" <kernel@...labora.com>,
"linux-kernel" <linux-kernel@...r.kernel.org>,
"linux-nvme" <linux-nvme@...ts.infradead.org>,
"sagi" <sagi@...mberg.me>
Subject: Re: [PATCH] nvme-pci: 512 byte aligned dma pool segment quirk
---- On Tue, 10 Dec 2024 21:36:55 +0000 Keith Busch wrote ---
> On Mon, Dec 09, 2024 at 04:33:01PM +0100, Paweł Anikiel wrote:
> > On Mon, Dec 9, 2024 at 1:33 PM Robert Beckett bob.beckett@...labora.com> wrote:
> > > [...]
> > > I have no further updates on this. I have received no further info from the vendor.
> > > I think we can go ahead and use the alignment patch as is. The only outstanding question was whether it is an
> > > implicit last entry per page chain vs simple alisngment requirement. Either way, using the dmapool
> > > alignment fixes both of these potential causes, so we should just take it as is.
> > > If we ever get any better info and can do a more specific patch in future, we can rework it then.
> >
> > I think the 512 byte alignment fix is good. I tried coming up with
> > something more specific, but everything I could think of was either
> > too complicated or artificial.
> >
> > Regarding the question of whether this is an alignment requirement or
> > the last PRP entry issue, I strongly believe it's the latter. I have a
> > piece of code that clearly demonstrates the hardware bug when run on a
> > device with the nvme bridge. I would really appreciate it if this was
> > verified and my explanation was included in the patch.
>
> I've pushed this to nvme-6.13 with an updated commit message here:
>
> https://git.infradead.org/?p=nvme.git;a=commitdiff;h=ccd84b8d6f4a60626dacb933b5d56dadca75c0ca
lgtm. Thanks!
>
> I can force an update if you have any edit suggestions.
>
> Commit message copied below:
>
> Author: Robert Beckett bob.beckett@...labora.com>
>
> nvme-pci: 512 byte aligned dma pool segment quirk
>
> We initially introduced a quick fix limiting the queue depth to 1 as
> experimentation showed that it fixed data corruption on 64GB steamdecks.
>
> Further experimentation revealed corruption only happens when the last
> PRP data element aligns to the end of the page boundary. The device
> appears to treat this as a PRP chain to a new list instead of the data
> element that it actually is. This is an implementation is in violation
> of the spec. Encountering this errata with the Linux driver requires the
> host request a 128k transfer and coincidently get the last small pool
> dma buffer within a page.
>
> The QD1 quirk effectly works around this because the last data PRP
> always was at a 248 byte offset from the page start, so it never
> appeared at the end of the page. Further, the MDTS is also small enough
> that the "large" prp pool can hold enough PRP elements to never get to
> the end, so that pool is not a problem either.
>
> Introduce a new quirk to ensure the small pool is always aligned such
> that the last PRP element can't appear a the end of the page. This comes
> at the expense of wasting 256 bytes per small pool page allocated.
>
> Link: https://lore.kernel.org/linux-nvme/20241113043151.GA20077@lst.de/T/#u
> Fixes: 83bdfcbdbe5d ("nvme-pci: qdepth 1 quirk")
> Cc: Paweł Anikiel panikiel@...gle.com>
> Signed-off-by: Robert Beckett bob.beckett@...labora.com>
> Signed-off-by: Keith Busch kbusch@...nel.org>
>
Powered by blists - more mailing lists