lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20250601220418.GC179983@mit.edu> Date: Sun, 1 Jun 2025 22:04:18 +0000 From: "Theodore Ts'o" <tytso@....edu> To: Mitta Sai Chaithanya <mittas@...rosoft.com> Cc: "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>, Nilesh Awate <Nilesh.Awate@...rosoft.com>, Ganesan Kalyanasundaram <ganesanka@...rosoft.com>, Pawan Sharma <sharmapawan@...rosoft.com> Subject: Re: EXT4/JBD2 Not Fully Released device after unmount of NVMe-oF Block Device On Sun, Jun 01, 2025 at 11:02:05AM +0000, Mitta Sai Chaithanya wrote: > Hi Team, > > I'm encountering journal block device (JBD2) errors after unmounting > a device and have been trying to trace the source of > these errors. I've observed that these JBD2 errors only > occur if the entries under /proc/fs/ext4/<device_name> or > /proc/fs/jbd2/<device_name> still exist even after a > successful unmount (the unmount command returns success). What you are seeing is I/O errors, not jbd2 errors. i.e., > 2025-06-01T10:01:11.568304+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30452.346875] nvme nvme0: Failed reconnect attempt 6 These errors may have been caused by the jbd2 layer issuing I/O requests, but these are not failures of the jbd2 subsystem. Rather, that _apparently_ ext4/jbd2 is issuing I/O's after the NVMe-OF connection has been torn down. It appears that you are assuming once umount command/system call has successfuly returned, that the kernel file system will be done sending I/O requests to the block device. This is simply not true. For example, consider what happens if you do something like: # mount /dev/sda1 /mnt # mount --bind /mnt /mnt2 # umount /mnt The umount command will have returned successfully, but the ext4 file system is still mounted, thanks to the bind mount. And it's not just bind mounts. If you have one or more processes in a different mount namespace (created using clone(2) with the CLONE_NEWNS flag) so long as those processes are active, the file system will stay active regardless of the file system being unounted in the original mount namespace. Internally inside in the kernel, this is the distinction between the "struct super" object, and the "struct vfsmnt" object. The umount(2) system call removes the vfsmnt object from a mount namespace object, and decrements the refcount of the vfsmnt object. The "struct super" object can not be deleted so long as there is at least one vfsmnt object pointing at the "struct super" object. So when you say that /proc/fs/ext4/<device_name> still exists, that is an indication that "struct super" for that particular ext4 file system is still alive, and so of course, there can still be ext4 and jbd2 I/O activity happening. > I'd like to understand how to debug this issue further to determine > the root cause. Specifically, I’m looking for guidance on what > kernel-level references or subsystems might still be holding on to > the journal or device structures post-unmount, and how to trace or > identify them effectively (or) is this has fixed in latest versions > of ext4? I don't see any evidence of anything "wrong" that requires fixing in the kernel. It looks something or someone assumed that the file system was deactivated after the umount and then tore down the NVMe-OF TCP connection, even though the file system was still active, resulting in those errors. But that's not a kernel bug; but rather a bug in some human's understanding of how umount works in the context of bind mounts and mount namespaces. Cheers, - Ted
Powered by blists - more mailing lists