linux-ext4 - Re: fsck.ext4: Group descriptors look bad... trying backup blocks...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <49EC8B97.6010308@redhat.com>
Date:	Mon, 20 Apr 2009 09:49:59 -0500
From:	Eric Sandeen <sandeen@...hat.com>
To:	Theodore Tso <tytso@....edu>
CC:	Jeremy Sanders <jss@....cam.ac.uk>, linux-ext4@...r.kernel.org
Subject: Re: fsck.ext4: Group descriptors look bad... trying backup	blocks...

Theodore Tso wrote:
> On Mon, Apr 20, 2009 at 12:43:37PM +0100, Jeremy Sanders wrote:
>> It takes a day or two to do the sync. I've only done it twice (one with  
>> the old kernel, once with the new fedora testing kernel) and it happened  
>> both times. I'm afraid the statistics are rather low number here.
>>
>> I did a different faster test (just copying my home directory lots of  
>> times), but I wasn't able to get it to fail. That test didn't use much  
>> disk space, however. Maybe it's worth just dd'ing a few TB of data onto  
>> the device and seeing whether that fails.
>>
>> I didn't reboot this time - I did last time. I just unmounted the file  
>> system and fsckd it. The filesystem is 8.2TB and the data is around 
>> 2.5TB.

I think trying a filesystem with just under 8T would be a useful test too.

> That's that's useful data.  I wish we could make it fail more quickly
> on a smaller rsync, but the fact that you didn't need to reboot is
> definitely useful information.
> 
> And this is a fresh rsync so no files were being deleted, rsync should
> have just been writing new files to .filename.XXXXX and then renaming
> the filename to filename.XXXXX when it is done, right? 
> 
> OK, let me think about this a little.  I think we can create a patch
> which checks for writes to the block group descriptors and dumps a
> stack trace.  That would allow us catch the failing code in question
> in the act, and maybe figure out what is going on.

XFS has block-zero tests, because there was once a bug where
uninitialized block numbers in buffers were clobbering the superblock at
block 0.  It was helpful, so I think this is a good idea, Ted.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html