lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM5zL5rKsEd1EhOx1AGj9Au7-FQnJ5fUX2hLPEDQvmcrJXFFBg@mail.gmail.com>
Date: Thu, 14 Nov 2024 15:13:52 +0100
From: Paweł Anikiel <panikiel@...gle.com>
To: Robert Beckett <bob.beckett@...labora.com>
Cc: axboe <axboe@...nel.dk>, hch <hch@....de>, kbusch <kbusch@...nel.org>, 
	kernel <kernel@...labora.com>, linux-kernel <linux-kernel@...r.kernel.org>, 
	linux-nvme <linux-nvme@...ts.infradead.org>, sagi <sagi@...mberg.me>
Subject: Re: [PATCH] nvme-pci: 512 byte aligned dma pool segment quirk

On Thu, Nov 14, 2024 at 2:24 PM Robert Beckett
<bob.beckett@...labora.com> wrote:
> This is interesting.
> I had the same idea previously. I initially just changed the hard coded 256 / 8 to use 31 instead, which should have ensured the last entry of each segment never gets used.
> When I tested that, it not longer failed, which was a good sign. So then I modified it to only do that on the last 256 byte segment of a page, but then is started failing again.

Could you elaborate the "only do that on the last 256 byte segment of
a page" part? How did you check which chunk of the page would be
allocated before choosing the dma pool?

> I never saw any bus error during my testing, just wrong data read, which then fails image verification. I was expecting iommu error logs if it was trying to access a chain in to nowhere if it always interpreted last entry in page as a link. I never saw any iommu errors.

Maybe I misspoke, the "bus error" part was just my speculation, I
didn't look at the IOMMU logs or anything like that.

> I'd be glad to if you could share your testing method.

I dumped all the nvme transfers before the crash happened (using
tracefs), and I saw a read of size 264 = 8 + 256, which led me to the
chaining theory. To test this claim, I wrote a simple pci device
driver which creates one IO queue and submits a read command where the
PRP list is set up in a way that tests if the controller treats it as
a chained list or not. I ran it, and it indeed treated the last PRP
entry as a chained pointer.

It is possible that this is not the only thing that's wrong. Could you
share your patch that checks your "only do that on the last 256 byte
segment of a page" idea?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ