lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20250702143755.GB3471@mit.edu> Date: Wed, 2 Jul 2025 10:37:55 -0400 From: "Theodore Ts'o" <tytso@....edu> To: Jean-Louis Dupond <jean-louis@...ond.be> Cc: linux-ext4@...r.kernel.org Subject: Re: ext4 metadata corruption - snapshot related? On Wed, Jul 02, 2025 at 03:43:25PM +0200, Jean-Louis Dupond wrote: > We updated a machine to a newer 6.15.2-1.el8.elrepo.x86_64 kernel, and the > same? bug reoccurred after some time: > > The error was the following: > Jul 02 11:03:35 xxxxx kernel: EXT4-fs error (device sdd1): ext4_lookup:1791: > inode #44962812: comm imap: deleted inode referenced: 44997932 > Jul 02 11:03:35 xxxxx kernel: EXT4-fs error (device sdd1): ext4_lookup:1791: > inode #44962812: comm imap: deleted inode referenced: 44997932 > Jul 02 11:03:35 xxxxx kernel: EXT4-fs error (device sdd1): ext4_lookup:1791: > inode #44962812: comm imap: deleted inode referenced: 44997932 > Jul 02 11:04:03 xxxxx kernel: EXT4-fs error (device sdd1): ext4_lookup:1791: > inode #44962812: comm imap: deleted inode referenced: 44997932 > > Any idea's on how this could be debugged further? If it's correlated to snapshots, then I'd certainly be trying to looking at potential bugs on the hypervisor. We've also had a bug where people were trying to look at bugs on the guest kernel, but the bug ended up being root caused to a bug on the host kernel. If moving from 4.18 Cloudlinux 8 kernel to a 6.15.2 RHEL8 kernel shows the same problem, then it does suggest that the problem isn't with the guest kernel, but rather in the part of the setup which didn't change (e.g., the host kernel and hypervisor). Without a whole lot more details about what your workload might be, what the host OS software might be, etc., it's really hard to make any further suggestions. Are you running this on some kind of cloud infrastructure (e.g., Microsoft Azure, Amazon AWS, Google Cloud, etc? Something else? Have you tried running your workload on some kind of alternate infrastructure and see if the problem gets solved if you use a different Cloud provider? - Ted
Powered by blists - more mailing lists