lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <BN9PR18MB4219FBD6D79413965DDEFA6D9812A@BN9PR18MB4219.namprd18.prod.outlook.com>
Date: Mon, 22 Sep 2025 11:11:15 +0000
From: Andrea Biardi <Andrea.Biardi@...visolutions.com>
To: linux-ext4 <linux-ext4@...r.kernel.org>
Subject: ext4: failed to convert unwritten extents (6.12.31 regression)

Hi All,

The CI process of a product that I'm working on involves the creation of a temporary KVM VM which boots a cdrom image containing a custom kernel + busybox in order to flash a filesystem image to /dev/vda, then shuts it down and exports the VM (that's my "deliverable" for the next stage).

For this custom kernel, I have used 6.6.x for a long time; after upgrading to 6.12, I started observing filesystem corruption in the deliverable image and these messages in dmesg (these are produced by the imaging kernel during flashing):

[   10.188754] EXT4-fs (vda2): mounted filesystem 42e94213-17de-4a91-9c58-c39852446bf2 r/w with ordered data mode. Quota mode: none.
[   11.612142] EXT4-fs (vda1): mounted filesystem e32da11b-d5d4-4621-a7d4-8b9bc5034c83 r/w with ordered data mode. Quota mode: none.
[  174.903010] I/O error, dev vda, sector 167922 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[  174.903023] I/O error, dev vda, sector 167938 op 0x1:(WRITE) flags 0x4000 phys_seg 254 prio class 0
[  174.903027] I/O error, dev vda, sector 169970 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[  174.903031] EXT4-fs warning (device vda1): ext4_end_bio:353: I/O error 10 writing to inode 16 starting block 84985)
[  174.903106] I/O error, dev vda, sector 169986 op 0x1:(WRITE) flags 0x4000 phys_seg 254 prio class 0
[  174.903172] I/O error, dev vda, sector 172018 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[  174.903176] EXT4-fs warning (device vda1): ext4_end_bio:353: I/O error 10 writing to inode 16 starting block 86009)
[  174.903239] I/O error, dev vda, sector 172034 op 0x1:(WRITE) flags 0x4000 phys_seg 254 prio class 0
[  174.903297] I/O error, dev vda, sector 174066 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 0
[  174.903300] EXT4-fs warning (device vda1): ext4_end_bio:353: I/O error 10 writing to inode 16 starting block 87033)
[  174.903371] I/O error, dev vda, sector 174082 op 0x1:(WRITE) flags 0x4000 phys_seg 254 prio class 0
[  174.903401] EXT4-fs (vda1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 16, error -5)
[  174.906697] Buffer I/O error on device vda1, logical block 84993
[  174.906708] Buffer I/O error on device vda1, logical block 84994
[  174.906710] Buffer I/O error on device vda1, logical block 84995
[  174.906712] Buffer I/O error on device vda1, logical block 84996
[  174.906716] Buffer I/O error on device vda1, logical block 84997
[  174.906718] Buffer I/O error on device vda1, logical block 84998
[  174.906719] Buffer I/O error on device vda1, logical block 84999
[  174.906721] Buffer I/O error on device vda1, logical block 85000
[  174.906723] Buffer I/O error on device vda1, logical block 85001
[  174.906724] Buffer I/O error on device vda1, logical block 85002
[  174.928451] EXT4-fs warning (device vda1): ext4_end_bio:353: I/O error 10 writing to inode 16 starting block 83961)
[  174.928787] EXT4-fs (vda1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 16, error -5)
[  175.019677] EXT4-fs warning (device vda1): ext4_end_bio:353: I/O error 10 writing to inode 16 starting block 88169)
[  175.019752] EXT4-fs (vda1): failed to convert unwritten extents to written extents -- potential data loss!  (inode 16, error -5)
[  183.121276] EXT4-fs (vda1): unmounting filesystem e32da11b-d5d4-4621-a7d4-8b9bc5034c83.
[  183.711275] EXT4-fs (vda2): unmounting filesystem 42e94213-17de-4a91-9c58-c39852446bf2.

The relevant sequence of events inside the imaging VM is:
1) sfdisk /dev/vda (creates: vda1 for /boot, vda2 for the root filesystem)
2) mke2fs -t ext4 on both
3) mount at /mnt and /mnt/boot and rsync the source image (~100k files)
4) chroot to make a couple modifications, install grub and rebuild the initrd
5) shutdown

The error I'm now seeing always occurs as a result of rebuilding the initrd (although I'm not sure why, certainly the rsync sees a lot more I/O over the 3 preceding minutes). As the sole purpose of this VM is to flash a filesystem image, nothing else is happening in the background.

I've done a rough bisection based on kernel releases and this problem occurs on 6.12.31 (212 out of 365 runs) and later, including 6.16.7 (6.12.30 is fine, just as 6.6.106 was).

Looking at the changelog for 6.12.31, commit 785ac699113320e3c3968754ca0c78d40a013107 "ext4: do not convert the unwritten extents if data writeback fails" stands out.

The configuration of the custom kernel used in the VM is fairly generic -- mostly a default x86_64 config with stuff that I don't need turned off: IPv6, sound, wireless, a few other bits.

I can rule out issues with the underlying hardware (tried on 3 different KVM hosts and nothing in host's dmesg either).

Also, I have a similar procedure (same custom kernel, same imagaging scripts) that runs against ESXi and Hyper-V hypervisors (to create ESXi or Hyper-V VM images, respectively) and neither exhibits this problem (the notable difference, I suppose, is the block device being sda, i.e. not virtio).

For reasons that I don't understand, the regression occurs only if the imaging involves 2 distinct partitions / filesystems (boot and root). If I make a single partition/filesystem and mount that at /mnt, the error doesn't trigger. This may be a coincidence, however it's hard to ignore the fact the the file corruption always happens on the mounted /boot (that's where dracut writes the initrd), and in the single-partition case there's a single ext4 filesystem (disclaimer: haven't done hundreds of runs for this case).

Any ideas?

Thanks
Andrea.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ