linux-ext4 - Re: [EXTERNAL] Re: EXT4/JBD2 Not Fully Released device after unmount of NVMe-oF Block Device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <TYZP153MB0627DED95B9B9B2E86D66EFED762A@TYZP153MB0627.APCP153.PROD.OUTLOOK.COM>
Date: Mon, 2 Jun 2025 21:32:18 +0000
From: Mitta Sai Chaithanya <mittas@...rosoft.com>
To: Theodore Ts'o <tytso@....edu>
CC: "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>, Nilesh Awate
	<Nilesh.Awate@...rosoft.com>, Ganesan Kalyanasundaram
	<ganesanka@...rosoft.com>, Pawan Sharma <sharmapawan@...rosoft.com>
Subject: Re: [EXTERNAL] Re: EXT4/JBD2 Not Fully Released device after unmount
 of NVMe-oF Block Device

Hi Ted,
           Thanks for your quick response. You're right that we use a bind mount; however,
I'm certain that we first unmount the bind mount before unmounting the original mount.
I also checked in a different namespace and couldn't find any reference that NVMe device being mounted.

>  2025-06-01T10:01:11.568304+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30452.346875] nvme nvme0: Failed reconnect attempt 6
>  Rather, that _apparently_ ext4/jbd2 is issuing I/O's after the NVMe-OF connection has been torn down.

Yes, I am reproducing the issue and expected to see connection outage errors for a few seconds (i.e., within the tolerable time frame).
However, after the connection is re-established and the device is unmounted from all namespaces, I still observe errors from both ext4 and jb2
when the device is especially disconnected.


>So when you say that /proc/fs/ext4/<device_name> still exists, that is an
> indication that "struct super" for that particular ext4 file system is
> still alive, and so of course, there can still be ext4 and jbd2 I/O
> activity happening.

So even when no user-space process is holding the device, and it has been unmounted from all namespaces, mounts and bind mounts,
is there still a possibility of I/O occurring on the device? If so, how long does the kernel typically take to flush any remaining I/O operations,
whether from ext4 or jb2?

Another point I would like to mention, I am observing JBD2 errors especially after NVMe-oF device has been disconnected and below are the logs.

Logs:

[Wed May 14 16:58:50 2025] nvme nvme0: Removing ctrl: NQN "nqn.2019-05.io.openebs:4cde20d8-ed8f-47ef-90c7-8cf9521a5734"
[Wed May 14 16:58:50 2025] Buffer I/O error on dev nvme0n1, logical block 1081344, lost sync page write
[Wed May 14 16:58:50 2025] JBD2: Error -5 detected when updating journal superblock for nvme0n1-8.
[Wed May 14 16:58:50 2025] Aborting journal on device nvme0n1-8.
[Wed May 14 16:58:50 2025] blk_update_request: recoverable transport error, dev nvme0n1, sector 8650752 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 0
[Wed May 14 16:58:50 2025] Buffer I/O error on dev nvme0n1, logical block 1081344, lost sync page write
[Wed May 14 16:58:50 2025] JBD2: Error -5 detected when updating journal superblock for nvme0n1-8.
[Wed May 14 16:58:50 2025] EXT4-fs error (device nvme0n1): ext4_put_super:1205: comm ig: Couldn't clean up the journal
[Wed May 14 16:58:50 2025] blk_update_request: recoverable transport error, dev nvme0n1, sector 0 op 0x1:(WRITE) flags 0x23800 phys_seg 1 prio class 0
[Wed May 14 16:58:50 2025] Buffer I/O error on dev nvme0n1, logical block 0, lost sync page write
[Wed May 14 16:58:50 2025] EXT4-fs (nvme0n1): I/O error while writing superblock
[Wed May 14 16:58:50 2025] EXT4-fs (nvme0n1): Remounting filesystem read-only
[Wed May 14 16:58:50 2025] blk_update_request: recoverable transport error, dev nvme0n1, sector 0 op 0x1:(WRITE) flags 0x23800 phys_seg 1 prio class 0
[Wed May 14 16:58:50 2025] Buffer I/O error on dev nvme0n1, logical block 0, lost sync page write
[Wed May 14 16:58:50 2025] EXT4-fs (nvme0n1): I/O error while writing superblock



Thanks & Regards,
Sai




________________________________________
From: Theodore Ts'o <tytso@....edu>
Sent: Monday, June 02, 2025 03:34
To: Mitta Sai Chaithanya <mittas@...rosoft.com>
Cc: linux-ext4@...r.kernel.org <linux-ext4@...r.kernel.org>; Nilesh Awate <Nilesh.Awate@...rosoft.com>; Ganesan Kalyanasundaram <ganesanka@...rosoft.com>; Pawan Sharma <sharmapawan@...rosoft.com>
Subject: [EXTERNAL] Re: EXT4/JBD2 Not Fully Released device after unmount of NVMe-oF Block Device


[You don't often get email from tytso@....edu. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]



On Sun, Jun 01, 2025 at 11:02:05AM +0000, Mitta Sai Chaithanya wrote:

> Hi Team,

>

> I'm encountering journal block device (JBD2) errors after unmounting

> a device and have been trying to trace the source of

> these errors. I've observed that these JBD2 errors only

> occur if the entries under /proc/fs/ext4/<device_name> or

> /proc/fs/jbd2/<device_name> still exist even after a

> successful unmount (the unmount command returns success).



What you are seeing is I/O errors, not jbd2 errors.  i.e.,



> 2025-06-01T10:01:11.568304+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30452.346875] nvme nvme0: Failed reconnect attempt 6



These errors may have been caused by the jbd2 layer issuing I/O

requests, but these are not failures of the jbd2 subsystem.  Rather,

that _apparently_ ext4/jbd2 is issuing I/O's after the NVMe-OF

connection has been torn down.



It appears that you are assuming once umount command/system call has

successfuly returned, that the kernel file system will be done sending

I/O requests to the block device.  This is simply not true.  For

example, consider what happens if you do something like:



# mount /dev/sda1 /mnt

# mount --bind /mnt /mnt2

# umount /mnt



The umount command will have returned successfully, but the ext4 file

system is still mounted, thanks to the bind mount.  And it's not just

bind mounts.  If you have one or more processes in a different mount

namespace (created using clone(2) with the CLONE_NEWNS flag) so long

as those processes are active, the file system will stay active

regardless of the file system being unounted in the original mount

namespace.



Internally inside in the kernel, this is the distinction between the

"struct super" object, and the "struct vfsmnt" object.  The umount(2)

system call removes the vfsmnt object from a mount namespace object,

and decrements the refcount of the vfsmnt object.



The "struct super" object can not be deleted so long as there is at

least one vfsmnt object pointing at the "struct super" object.  So

when you say that /proc/fs/ext4/<device_name> still exists, that is an

indication that "struct super" for that particular ext4 file system is

still alive, and so of course, there can still be ext4 and jbd2 I/O

activity happening.



> I'd like to understand how to debug this issue further to determine

> the root cause. Specifically, I’m looking for guidance on what

> kernel-level references or subsystems might still be holding on to

> the journal or device structures post-unmount, and how to trace or

> identify them effectively (or) is this has fixed in latest versions

> of ext4?



I don't see any evidence of anything "wrong" that requires fixing in

the kernel.  It looks something or someone assumed that the file

system was deactivated after the umount and then tore down the NVMe-OF

TCP connection, even though the file system was still active,

resulting in those errors.



But that's not a kernel bug; but rather a bug in some human's

understanding of how umount works in the context of bind mounts and

mount namespaces.



Cheers,



                                                - Ted