linux-ext4 - Re: EXT4/JBD2 Not Fully Released device after unmount of NVMe-oF Block Device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250601220418.GC179983@mit.edu>
Date: Sun, 1 Jun 2025 22:04:18 +0000
From: "Theodore Ts'o" <tytso@....edu>
To: Mitta Sai Chaithanya <mittas@...rosoft.com>
Cc: "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
        Nilesh Awate <Nilesh.Awate@...rosoft.com>,
        Ganesan Kalyanasundaram <ganesanka@...rosoft.com>,
        Pawan Sharma <sharmapawan@...rosoft.com>
Subject: Re: EXT4/JBD2 Not Fully Released device after unmount of NVMe-oF
 Block Device

On Sun, Jun 01, 2025 at 11:02:05AM +0000, Mitta Sai Chaithanya wrote:
> Hi Team,
>
> I'm encountering journal block device (JBD2) errors after unmounting
> a device and have been trying to trace the source of
> these errors. I've observed that these JBD2 errors only
> occur if the entries under /proc/fs/ext4/<device_name> or
> /proc/fs/jbd2/<device_name> still exist even after a
> successful unmount (the unmount command returns success).

What you are seeing is I/O errors, not jbd2 errors.  i.e.,

> 2025-06-01T10:01:11.568304+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30452.346875] nvme nvme0: Failed reconnect attempt 6

These errors may have been caused by the jbd2 layer issuing I/O
requests, but these are not failures of the jbd2 subsystem.  Rather,
that _apparently_ ext4/jbd2 is issuing I/O's after the NVMe-OF
connection has been torn down.

It appears that you are assuming once umount command/system call has
successfuly returned, that the kernel file system will be done sending
I/O requests to the block device.  This is simply not true.  For
example, consider what happens if you do something like:

# mount /dev/sda1 /mnt
# mount --bind /mnt /mnt2
# umount /mnt

The umount command will have returned successfully, but the ext4 file
system is still mounted, thanks to the bind mount.  And it's not just
bind mounts.  If you have one or more processes in a different mount
namespace (created using clone(2) with the CLONE_NEWNS flag) so long
as those processes are active, the file system will stay active
regardless of the file system being unounted in the original mount
namespace.

Internally inside in the kernel, this is the distinction between the
"struct super" object, and the "struct vfsmnt" object.  The umount(2)
system call removes the vfsmnt object from a mount namespace object,
and decrements the refcount of the vfsmnt object.

The "struct super" object can not be deleted so long as there is at
least one vfsmnt object pointing at the "struct super" object.  So
when you say that /proc/fs/ext4/<device_name> still exists, that is an
indication that "struct super" for that particular ext4 file system is
still alive, and so of course, there can still be ext4 and jbd2 I/O
activity happening.

> I'd like to understand how to debug this issue further to determine
> the root cause. Specifically, I’m looking for guidance on what
> kernel-level references or subsystems might still be holding on to
> the journal or device structures post-unmount, and how to trace or
> identify them effectively (or) is this has fixed in latest versions
> of ext4?

I don't see any evidence of anything "wrong" that requires fixing in
the kernel.  It looks something or someone assumed that the file
system was deactivated after the umount and then tore down the NVMe-OF
TCP connection, even though the file system was still active,
resulting in those errors.

But that's not a kernel bug; but rather a bug in some human's
understanding of how umount works in the context of bind mounts and
mount namespaces.

Cheers,

						- Ted