lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <TYZP153MB06279836B028CF36EB7ED260D761A@TYZP153MB0627.APCP153.PROD.OUTLOOK.COM>
Date: Sun, 1 Jun 2025 11:02:05 +0000
From: Mitta Sai Chaithanya <mittas@...rosoft.com>
To: "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
CC: Nilesh Awate <Nilesh.Awate@...rosoft.com>, Ganesan Kalyanasundaram
	<ganesanka@...rosoft.com>, Pawan Sharma <sharmapawan@...rosoft.com>
Subject: EXT4/JBD2 Not Fully Released device after unmount of NVMe-oF Block
 Device

Hi Team,
           I'm encountering journal block device (JBD2) errors after unmounting a device and have been trying to trace the source of these errors. I've observed that these JBD2 errors only occur if the entries under /proc/fs/ext4/<device_name> or /proc/fs/jbd2/<device_name> still exist even after a successful unmount (the unmount command returns success).

For context: the block device (/dev/nvme0n1) is connected over NVMe-oF TCP to a remote target. I'm confident that no I/O is stuck on the target side, as there are no related I/O errors or warnings in the kernel logs where the target is connected.

However, the /proc entries mentioned above remain even after a successful unmount, and this seems to correlate with the journal-related errors.

I'd like to understand how to debug this issue further to determine the root cause. Specifically, I’m looking for guidance on what kernel-level references or subsystems might still be holding on to the journal or device structures post-unmount, and how to trace or identify them effectively (or) is this has fixed in latest versions of ext4?

Proc entries exist even after unmount:
root@...-nodepool1-44537149-vmss000002 [ / ]# ls /proc/fs/ext4/nvme0n1/
es_shrinker_info  fc_info  mb_groups  mb_stats  mb_structs_summary  options
root@...-nodepool1-44537149-vmss000002 [ / ]# ls /proc/fs/jbd2/nvme0n1-8/
info


Active process associated with unmounted device:
root      636845  0.0  0.0      0     0 ?        S    08:43   0:03 [jbd2/nvme0n1-8]
root      636987  0.0  0.0      0     0 ?        I<   08:43   0:00 [dio/nvme0n1]
root      699903  0.0  0.0      0     0 ?        I    09:18   0:01 [kworker/u16:1-nvme-wq]
root      761100  0.0  0.0      0     0 ?        I<   09:50   0:00 [kworker/1:1H-nvme_tcp_wq]
root      763896  0.0  0.0      0     0 ?        I<   09:52   0:00 [kworker/0:0H-nvme_tcp_wq]
root      779007  0.0  0.0      0     0 ?        I<   10:01   0:00 [kworker/0:1H-nvme_tcp_wq]


Stack trace of process (after unmount):

root@...-nodepool1-44537149-vmss000002 [ / ]# cat /proc/636845/stack
[<0>] kjournald2+0x219/0x270
[<0>] kthread+0x12a/0x150
[<0>] ret_from_fork+0x22/0x30
root@...-nodepool1-44537149-vmss000002 [ / ]# cat /proc/636846/stack
[<0>] rescuer_thread+0x2db/0x3b0
[<0>] kthread+0x12a/0x150
[<0>] ret_from_fork+0x22/0x30


 [ / ]# cat /proc/636987/stack
[<0>] rescuer_thread+0x2db/0x3b0
[<0>] kthread+0x12a/0x150
[<0>] ret_from_fork+0x22/0x30

 [ / ]# cat /proc/699903/stack
[<0>] worker_thread+0xcd/0x3d0
[<0>] kthread+0x12a/0x150
[<0>] ret_from_fork+0x22/0x30

 [ / ]# cat /proc/761100/stack
[<0>] worker_thread+0xcd/0x3d0
[<0>] kthread+0x12a/0x150
[<0>] ret_from_fork+0x22/0x30

 [ / ]# cat /proc/763896/stack
[<0>] worker_thread+0xcd/0x3d0
[<0>] kthread+0x12a/0x150
[<0>] ret_from_fork+0x22/0x30

[ / ]# cat /proc/779007/stack
[<0>] worker_thread+0xcd/0x3d0
[<0>] kthread+0x12a/0x150
[<0>] ret_from_fork+0x22/0x30


Kernel Logs:

2025-06-01T10:01:11.568304+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30452.346875] nvme nvme0: Failed reconnect attempt 6
2025-06-01T10:01:11.568330+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30452.346881] nvme nvme0: Reconnecting in 10 seconds...
2025-06-01T10:01:21.814134+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30462.596133] nvme nvme0: Connect command failed, error wo/DNR bit: 6
2025-06-01T10:01:21.814165+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30462.596186] nvme nvme0: failed to connect queue: 0 ret=6
2025-06-01T10:01:21.814174+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30462.596289] nvme nvme0: Failed reconnect attempt 7
2025-06-01T10:01:21.814176+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30462.596292] nvme nvme0: Reconnecting in 10 seconds...
2025-06-01T10:01:32.055063+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30472.836929] nvme nvme0: queue_size 128 > ctrl sqsize 64, clamping down
2025-06-01T10:01:32.055094+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30472.837002] nvme nvme0: creating 2 I/O queues.
2025-06-01T10:01:32.108286+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30472.886546] nvme nvme0: mapped 2/0/0 default/read/poll queues.
2025-06-01T10:01:32.108313+00:00 aks-nodepool1-44537149-vmss000002 kernel: [30472.887450] nvme nvme0: Successfully reconnected (8 attempt)

High level information of ext4:

root@...-nodepool1-44537149-vmss000002 [ / ]# dumpe2fs /dev/nvme0n1
dumpe2fs 1.46.5 (30-Dec-2021)
Filesystem volume name:   <none>
Last mounted on:          /datadir
Filesystem UUID:          1a564b4d-8f34-4f71-8370-802a239e350a
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index FEATURE_C12 filetype needs_recovery extent 64bit flex_bg metadata_csum_seed sparse_super large_file huge_file dir_nlink extra_isize metadata_csum FEATURE_R16
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              655360
Block count:              2620155
Reserved block count:     131007
Overhead clusters:        66747
Free blocks:              454698
Free inodes:              655344
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Reserved GDT blocks:      1024
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
RAID stripe width:        32
Flex block group size:    16
Filesystem created:       Sun Jun  1 08:36:28 2025
Last mount time:          Sun Jun  1 08:43:57 2025
Last write time:          Sun Jun  1 08:43:57 2025
Mount count:              4
Maximum mount count:      -1
Last checked:             Sun Jun  1 08:36:28 2025
Check interval:           0 (<none>)
Lifetime writes:          576 MB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     32
Desired extra isize:      32
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      22fed392-1993-4796-a996-feab145379ba
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0xea839b0c
Checksum seed:            0x8e742ce9
Journal features:         journal_64bit journal_checksum_v3
Total journal size:       64M
Total journal blocks:     16384
Max transaction length:   16384
Fast commit length:       0
Journal sequence:         0x000002a0
Journal start:            6816
Journal checksum type:    crc32c
Journal checksum:         0xa35736ab


Thanks & Regards,
Sai

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ