lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e1f2ac49-25f4-4b2c-b67c-10782b4e3455@suse.de>
Date: Mon, 14 Apr 2025 13:09:50 +0200
From: Hannes Reinecke <hare@...e.de>
To: "Aithal, Srikanth" <sraithal@....com>, hare@...nel.org
Cc: sagi@...mberg.me, hch@....de, kbusch@...nel.org, Ankit.Soni@....com,
 Vasant Hegde <vasant.hegde@....com>, open list
 <linux-kernel@...r.kernel.org>,
 Linux-Next Mailing List <linux-next@...r.kernel.org>
Subject: Re: Patch "nvme: re-read ANA log page after ns scan completes"
 causing regression

On 4/14/25 12:53, Aithal, Srikanth wrote:
> Hello,
> 
> With below patch in todays linux-next next-20250414 and v6.15-rc2 we are 
> seeing host boot issues. The host with nvme disk just hangs on boot.
> 
> If we revert this patch or disable CONFIG_NVME_MULTIPATH then host boots 
> fine.
> 
> commit 62baf70c327444338c34703c71aa8cc8e4189bd6
> Author: Hannes Reinecke <hare@...nel.org>
> Date:   Thu Apr 3 09:19:30 2025 +0200
> 
>      nvme: re-read ANA log page after ns scan completes
> 
>      When scanning for new namespaces we might have missed an ANA AEN.
> 
>      The NVMe base spec (NVMe Base Specification v2.1, Figure 151 
> 'Asynchonous
>      Event Information - Notice': Asymmetric Namespace Access Change) 
> states:
> 
>        A controller shall not send this even if an Attached Namespace
>        Attribute Changed asynchronous event [...] is sent for the same 
> event.
> 
>      so we need to re-read the ANA log page after we rescanned the 
> namespace
>      list to update the ANA states of the new namespaces.
> 
>      Signed-off-by: Hannes Reinecke <hare@...nel.org>
>      Reviewed-by: Keith Busch <kbusch@...nel.org>
>      Signed-off-by: Christoph Hellwig <hch@....de>
> 
> 
> Host console starts dumping a lot of errors and log size is more than 
> 100 MB. So I am not posting all logs here. I am pasting part of the logs 
> here:
> ...
> ...
> [   49.361223] nvme nvme0: controller is down; will reset: CSTS=0x3, 
> PCI_STATUS=0x1010
> [   49.434564] nvme0n1: I/O Cmd(0x2) @ LBA 0, 8 blocks, I/O Error (sct 
> 0x3 / sc 0x71)
> [   49.443123] I/O error, dev nvme0n1, sector 0 op 0x0:(READ) flags 
> 0x80700 phys_seg 1 prio class 0
> [   49.457080] nvme nvme0: Failed to get ANA log: -4
> [   49.506511] nvme nvme0: D3 entry latency set to 8 seconds
> [   49.536300] nvme nvme0: 32/0/0 default/read/poll queues
> [   49.605281] nvme 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
> domain=0x0018 address=0x0 flags=0x0000]
> [   80.081190] nvme nvme0: controller is down; will reset: CSTS=0x3, 
> PCI_STATUS=0x1010
> [   80.154109] nvme0n1: I/O Cmd(0x2) @ LBA 128, 8 blocks, I/O Error (sct 
> 0x3 / sc 0x71)
> [   80.162864] I/O error, dev nvme0n1, sector 128 op 0x0:(READ) flags 
> 0x80700 phys_seg 1 prio class 0
> [   80.177032] nvme nvme0: Failed to get ANA log: -4
> [   80.225460] nvme nvme0: D3 entry latency set to 8 seconds
> [   80.255395] nvme nvme0: 32/0/0 default/read/poll queues
> [   80.301278] nvme 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
> domain=0x0018 address=0x0 flags=0x0000]
> [  110.789207] nvme nvme0: controller is down; will reset: CSTS=0x3, 
> PCI_STATUS=0x1010
> [  110.861990] nvme0n1: I/O Cmd(0x2) @ LBA 2048, 8 blocks, I/O Error 
> (sct 0x3 / sc 0x71)
> [  110.870842] I/O error, dev nvme0n1, sector 2048 op 0x0:(READ) flags 
> 0x80700 phys_seg 1 prio class 0
> [  110.885040] nvme nvme0: Failed to get ANA log: -4
> [  110.933460] nvme nvme0: D3 entry latency set to 8 seconds
> [  110.963447] nvme nvme0: 32/0/0 default/read/poll queues
> [  111.009276] nvme 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
> domain=0x0018 address=0x0 flags=0x0000]
> ...
> ...
> 
> 
Can you try this?

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 78963cab1f74..425c00b02f3e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4455,7 +4455,7 @@ static void nvme_scan_work(struct work_struct *work)
         if (test_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events))
                 nvme_queue_scan(ctrl);
  #if CONFIG_NVME_MULTIPATH
-       else
+       else if (ctrl->ana_log_buf)
                 /* Re-read the ANA log page to not miss updates */
                 queue_work(nvme_wq, &ctrl->ana_work);
  #endif

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@...e.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ