lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <9a800759-b7f1-46dc-977c-7e39532ddec4@amd.com>
Date: Mon, 14 Apr 2025 16:23:50 +0530
From: "Aithal, Srikanth" <sraithal@....com>
To: hare@...nel.org
Cc: sagi@...mberg.me, hch@....de, kbusch@...nel.org, Ankit.Soni@....com,
 Vasant Hegde <vasant.hegde@....com>, open list
 <linux-kernel@...r.kernel.org>,
 Linux-Next Mailing List <linux-next@...r.kernel.org>
Subject: Patch "nvme: re-read ANA log page after ns scan completes" causing
 regression

Hello,

With below patch in todays linux-next next-20250414 and v6.15-rc2 we are 
seeing host boot issues. The host with nvme disk just hangs on boot.

If we revert this patch or disable CONFIG_NVME_MULTIPATH then host boots 
fine.

commit 62baf70c327444338c34703c71aa8cc8e4189bd6
Author: Hannes Reinecke <hare@...nel.org>
Date:   Thu Apr 3 09:19:30 2025 +0200

     nvme: re-read ANA log page after ns scan completes

     When scanning for new namespaces we might have missed an ANA AEN.

     The NVMe base spec (NVMe Base Specification v2.1, Figure 151 
'Asynchonous
     Event Information - Notice': Asymmetric Namespace Access Change) 
states:

       A controller shall not send this even if an Attached Namespace
       Attribute Changed asynchronous event [...] is sent for the same 
event.

     so we need to re-read the ANA log page after we rescanned the namespace
     list to update the ANA states of the new namespaces.

     Signed-off-by: Hannes Reinecke <hare@...nel.org>
     Reviewed-by: Keith Busch <kbusch@...nel.org>
     Signed-off-by: Christoph Hellwig <hch@....de>


Host console starts dumping a lot of errors and log size is more than 
100 MB. So I am not posting all logs here. I am pasting part of the logs 
here:
...
...
[   49.361223] nvme nvme0: controller is down; will reset: CSTS=0x3, 
PCI_STATUS=0x1010
[   49.434564] nvme0n1: I/O Cmd(0x2) @ LBA 0, 8 blocks, I/O Error (sct 
0x3 / sc 0x71)
[   49.443123] I/O error, dev nvme0n1, sector 0 op 0x0:(READ) flags 
0x80700 phys_seg 1 prio class 0
[   49.457080] nvme nvme0: Failed to get ANA log: -4
[   49.506511] nvme nvme0: D3 entry latency set to 8 seconds
[   49.536300] nvme nvme0: 32/0/0 default/read/poll queues
[   49.605281] nvme 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0018 address=0x0 flags=0x0000]
[   80.081190] nvme nvme0: controller is down; will reset: CSTS=0x3, 
PCI_STATUS=0x1010
[   80.154109] nvme0n1: I/O Cmd(0x2) @ LBA 128, 8 blocks, I/O Error (sct 
0x3 / sc 0x71)
[   80.162864] I/O error, dev nvme0n1, sector 128 op 0x0:(READ) flags 
0x80700 phys_seg 1 prio class 0
[   80.177032] nvme nvme0: Failed to get ANA log: -4
[   80.225460] nvme nvme0: D3 entry latency set to 8 seconds
[   80.255395] nvme nvme0: 32/0/0 default/read/poll queues
[   80.301278] nvme 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0018 address=0x0 flags=0x0000]
[  110.789207] nvme nvme0: controller is down; will reset: CSTS=0x3, 
PCI_STATUS=0x1010
[  110.861990] nvme0n1: I/O Cmd(0x2) @ LBA 2048, 8 blocks, I/O Error 
(sct 0x3 / sc 0x71)
[  110.870842] I/O error, dev nvme0n1, sector 2048 op 0x0:(READ) flags 
0x80700 phys_seg 1 prio class 0
[  110.885040] nvme nvme0: Failed to get ANA log: -4
[  110.933460] nvme nvme0: D3 entry latency set to 8 seconds
[  110.963447] nvme nvme0: 32/0/0 default/read/poll queues
[  111.009276] nvme 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT 
domain=0x0018 address=0x0 flags=0x0000]
...
...



View attachment "kconfig" of type "text/plain" (185194 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ