[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4c258a85-7b2d-4946-a64f-d0341c444119@amd.com>
Date: Mon, 14 Apr 2025 16:55:07 +0530
From: "Aithal, Srikanth" <sraithal@....com>
To: Hannes Reinecke <hare@...e.de>, hare@...nel.org
Cc: sagi@...mberg.me, hch@....de, kbusch@...nel.org, Ankit.Soni@....com,
Vasant Hegde <vasant.hegde@....com>, open list
<linux-kernel@...r.kernel.org>,
Linux-Next Mailing List <linux-next@...r.kernel.org>
Subject: Re: Patch "nvme: re-read ANA log page after ns scan completes"
causing regression
On 4/14/2025 4:39 PM, Hannes Reinecke wrote:
> On 4/14/25 12:53, Aithal, Srikanth wrote:
>> Hello,
>>
>> With below patch in todays linux-next next-20250414 and v6.15-rc2 we
>> are seeing host boot issues. The host with nvme disk just hangs on boot.
>>
>> If we revert this patch or disable CONFIG_NVME_MULTIPATH then host
>> boots fine.
>>
>> commit 62baf70c327444338c34703c71aa8cc8e4189bd6
>> Author: Hannes Reinecke <hare@...nel.org>
>> Date: Thu Apr 3 09:19:30 2025 +0200
>>
>> nvme: re-read ANA log page after ns scan completes
>>
>> When scanning for new namespaces we might have missed an ANA AEN.
>>
>> The NVMe base spec (NVMe Base Specification v2.1, Figure 151
>> 'Asynchonous
>> Event Information - Notice': Asymmetric Namespace Access Change)
>> states:
>>
>> A controller shall not send this even if an Attached Namespace
>> Attribute Changed asynchronous event [...] is sent for the
>> same event.
>>
>> so we need to re-read the ANA log page after we rescanned the
>> namespace
>> list to update the ANA states of the new namespaces.
>>
>> Signed-off-by: Hannes Reinecke <hare@...nel.org>
>> Reviewed-by: Keith Busch <kbusch@...nel.org>
>> Signed-off-by: Christoph Hellwig <hch@....de>
>>
>>
>> Host console starts dumping a lot of errors and log size is more than
>> 100 MB. So I am not posting all logs here. I am pasting part of the
>> logs here:
>> ...
>> ...
>> [ 49.361223] nvme nvme0: controller is down; will reset: CSTS=0x3,
>> PCI_STATUS=0x1010
>> [ 49.434564] nvme0n1: I/O Cmd(0x2) @ LBA 0, 8 blocks, I/O Error
>> (sct 0x3 / sc 0x71)
>> [ 49.443123] I/O error, dev nvme0n1, sector 0 op 0x0:(READ) flags
>> 0x80700 phys_seg 1 prio class 0
>> [ 49.457080] nvme nvme0: Failed to get ANA log: -4
>> [ 49.506511] nvme nvme0: D3 entry latency set to 8 seconds
>> [ 49.536300] nvme nvme0: 32/0/0 default/read/poll queues
>> [ 49.605281] nvme 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
>> domain=0x0018 address=0x0 flags=0x0000]
>> [ 80.081190] nvme nvme0: controller is down; will reset: CSTS=0x3,
>> PCI_STATUS=0x1010
>> [ 80.154109] nvme0n1: I/O Cmd(0x2) @ LBA 128, 8 blocks, I/O Error
>> (sct 0x3 / sc 0x71)
>> [ 80.162864] I/O error, dev nvme0n1, sector 128 op 0x0:(READ) flags
>> 0x80700 phys_seg 1 prio class 0
>> [ 80.177032] nvme nvme0: Failed to get ANA log: -4
>> [ 80.225460] nvme nvme0: D3 entry latency set to 8 seconds
>> [ 80.255395] nvme nvme0: 32/0/0 default/read/poll queues
>> [ 80.301278] nvme 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
>> domain=0x0018 address=0x0 flags=0x0000]
>> [ 110.789207] nvme nvme0: controller is down; will reset: CSTS=0x3,
>> PCI_STATUS=0x1010
>> [ 110.861990] nvme0n1: I/O Cmd(0x2) @ LBA 2048, 8 blocks, I/O Error
>> (sct 0x3 / sc 0x71)
>> [ 110.870842] I/O error, dev nvme0n1, sector 2048 op 0x0:(READ)
>> flags 0x80700 phys_seg 1 prio class 0
>> [ 110.885040] nvme nvme0: Failed to get ANA log: -4
>> [ 110.933460] nvme nvme0: D3 entry latency set to 8 seconds
>> [ 110.963447] nvme nvme0: 32/0/0 default/read/poll queues
>> [ 111.009276] nvme 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
>> domain=0x0018 address=0x0 flags=0x0000]
>> ...
>> ...
>>
>>
> Can you try this?
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 78963cab1f74..425c00b02f3e 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4455,7 +4455,7 @@ static void nvme_scan_work(struct work_struct
> *work)
> if (test_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events))
> nvme_queue_scan(ctrl);
> #if CONFIG_NVME_MULTIPATH
> - else
> + else if (ctrl->ana_log_buf)
> /* Re-read the ANA log page to not miss updates */
> queue_work(nvme_wq, &ctrl->ana_work);
> #endif
I applied it on top of next-20250414, tested and it fixes the issue.
Tested-by: Srikanth Aithal <sraithal@....com>
>
> Cheers,
>
> Hannes
Powered by blists - more mailing lists