[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZCHfmmv9WPxM4fD7@kbusch-mbp.dhcp.thefacebook.com>
Date: Mon, 27 Mar 2023 12:25:30 -0600
From: Keith Busch <kbusch@...nel.org>
To: Aleksander Trofimowicz <alex@....eu>
Cc: Bjorn Helgaas <helgaas@...nel.org>, Jens Axboe <axboe@...com>,
Christoph Hellwig <hch@....de>,
Sagi Grimberg <sagi@...mberg.me>,
Lukas Wunner <lukas@...ner.de>, linux-pci@...r.kernel.org,
linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [bugzilla-daemon@...nel.org: [Bug 217251] New: pciehp: nvme not
visible after re-insert to tbt port]
On Mon, Mar 27, 2023 at 05:43:18PM +0000, Aleksander Trofimowicz wrote:
>
> Keith Busch <kbusch@...nel.org> writes:
>
> > On Mon, Mar 27, 2023 at 09:33:59AM -0500, Bjorn Helgaas wrote:
> >> Forwarding to NVMe folks, lists for visibility.
> >>
> >> ----- Forwarded message from bugzilla-daemon@...nel.org -----
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=217251
> >> ...
> >>
> >> Created attachment 304031
> >> --> https://bugzilla.kernel.org/attachment.cgi?id=304031&action=edit
> >> the tracing of nvme_pci_enable() during re-insertion
> >>
> >> Hi,
> >>
> >> There is a JHL7540-based device that may host a NVMe device. After the first
> >> insertion a nvme drive is properly discovered and handled by the relevant
> >> modules. Once disconnected any further attempts are not successful. The device
> >> is visible on a PCI bus, but nvme_pci_enable() ends up calling
> >> pci_disable_device() every time; the runtime PM status of the device is
> >> "suspended", the power status of the 04:01.0 PCI bridge is D3. Preventing the
> >> device from being power managed ("on" -> /sys/devices/../power/control)
> >> combined with device removal and pci rescan changes nothing. A host reboot
> >> restores the initial state.
> >>
> >> I would appreciate any suggestions how to debug it further.
> >
> > Sounds the same as this report:
> >
> > http://lists.infradead.org/pipermail/linux-nvme/2023-March/038259.html
> >
> > The driver is bailing on the device because we can't read it's status register
> > out of the remapped BAR. There's nothing we can do about that from the nvme
> > driver level. Memory mapped IO has to work in order to proceed.
> >
> Thanks. I can confirm it is the same problem:
>
> a) the platform is Intel Alderlake
> b) readl(dev->bar + NVME_REG_CSTS) in nvme_pci_enable() fails
> c) reading BAR0 via setpci gives 0x00000004
It's strange too. In your example, kernel says:
0000:05:00.0: BAR 0: assigned [mem 0x54000000-0x54003fff 64bit]
There is a check right after that message that ensures the kernel reads back
what it wrote. No failures reported means the device really did have the
expected BAR value at one point.
Powered by blists - more mailing lists