[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7669753e-ec6c-421a-a132-3ae00b3b3db9@dybdal.dk>
Date: Tue, 1 Oct 2024 00:17:51 +0200
From: Jesper Dybdal <jd-ext4@...dal.dk>
To: linux-ext4@...r.kernel.org
Cc: Andreas Dilger <adilger@...ger.ca>
Subject: Re: Corrupted i_blocks field
On 2024-09-30 22:29, Andreas Dilger wrote:
> On Sep 27, 2024, at 8:38 AM, Jesper Dybdal<jd-ext4@...dal.dk> wrote:
>> I have now a few times experienced a problem with the i_blocks field of a few inodes being corrupted (replaced by extremely large numbers).
>>
>> I don't believe that it is a disk error - the file system is on a RAID1 partition and the RAID consistency is checked regularly.
>> I also find it hard to believe that it is a RAM error - the machine has run memtest86+ overnight without finding anything.
>>
>> The files I've seen corrupted are simple small text files that are modified only using an ordinary text editor (emacs).
>>
>> Fsck fixes it.
>> The system is an up-to-date Debian Bookworm:
>> Linux nuser 6.1.0-25-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
>>
>> I do one thing that is not the default for ext4: I use the "nodelalloc" option (because several years ago, there was a discussion about "delalloc or not" from which I got the impression that nodelalloc was probably slightly safer - if the resulting performance reduction is not a problem, which it is not for me):
>> /dev/md0 on / type ext4 (rw,relatime,nodelalloc,errors=remount-ro)
>>
>> Three examples follow below. Note that the bad field values, when interpreted as 48-bit signed numbers, are numerically small negative numbers (-25, -9, -3, respectively).
>>
>> Excerpts from the fsck logs:
>> root: Inode 10748715, i_blocks is 281474976710631, should be 5. FIXED.
>> root: Inode 10751288, i_blocks is 281474976710647, should be 3. FIXED.
>> root: Inode 10748542, i_blocks is 281474976710653, should be 1. FIXED.
>>
>> I don't know when the first two of these corruptions occurred, but the last one happened yesterday or the day before. The file in question was /etc/fstab, and I discovered the problem after I had edited fstab on Wednesday and rebooted on Thursday.
>>
>> The corrupted files can be read and copied without problems. I have not dared to delete any of those files before fsck had fixed them.
>>
>> What is going on here?
> This looks like an underflow of the used blocks count on the inode:
>
> 281474976710631 = 0xffffffffffe7
> 281474976710647 = 0xfffffffffff7
> 281474976710653 = 0xfffffffffffd
>
> This is 2^48 blocks, which is the limit for the number of blocks that fit
> into the available inode fields (32-bit i_blocks_lo, 16-bit i_blocks_hi).
>
> There is likely some kind of accounting error in the code. Is anything
> unusual with access patterns for those files (large xattrs/ACLs, are they
> files or directories or special files. mmap, truncate, fallocate, etc.)?
No. They are all simple small text configuration files, and I edit them
using Emacs. The only slightly unusual thing is, as I wrote earlier,
that the file system is mounted with the nodelalloc option.
The files I have identified are fstab and two postfix configuration
files: /etc/postfix/{main.cf,master.cf} . The problem has actually hit
master.cf twice.
I have verified that the only reboot that happened between the fstab
edit on Wednesday and seeing the problem Thursday, was a clean
deliberate reboot - no power outage of similar.
> If you are able to reproduce with the /etc/fstab editing, possibly strace
> could help to identify if something unusual is being done to the file.
I'll try, but I do not really expect Emacs to do strange things to the file
> Cheers, Andreas
Thanks,
Jesper
--
Jesper Dybdal
https://www.dybdal.dk
Powered by blists - more mailing lists