lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 20 Apr 2009 12:43:37 +0100 (BST)
From:	Jeremy Sanders <jss@....cam.ac.uk>
To:	Theodore Tso <tytso@....edu>
cc:	linux-ext4@...r.kernel.org
Subject: Re: fsck.ext4: Group descriptors look bad... trying backup
 blocks...

On Mon, 20 Apr 2009, Theodore Tso wrote:

> On Mon, Apr 20, 2009 at 10:33:09AM +0100, Jeremy Sanders wrote:
>>
>> However, the system seems to mostly work, so I recreated the ext4 device,
>> I've just run my backup script again and fsck'd the device. It seems the
>> problem is reproducible with the new kernel:
>
> When you say reproducible, how many times have you tried it, and were
> you able to reproduce it every single time?  50% of time?  I do
> believe there is a problem, but we haven't been able to something
> where it's easily reproducible.  So if you can easily reproduce this,
> this is definitely very exciting.

It takes a day or two to do the sync. I've only done it twice (one with 
the old kernel, once with the new fedora testing kernel) and it happened 
both times. I'm afraid the statistics are rather low number here.

I did a different faster test (just copying my home directory lots of 
times), but I wasn't able to get it to fail. That test didn't use much 
disk space, however. Maybe it's worth just dd'ing a few TB of data onto 
the device and seeing whether that fails.

>> [root@...ck2 ~]# fsck /dev/md0
>> fsck 1.41.4 (27-Jan-2009)
>> e2fsck 1.41.4 (27-Jan-2009)
>> fsck.ext4: Group descriptors look bad... trying backup blocks...
>> Group descriptor 0 checksum is invalid.  Fix<y>?
>
> Do you have to reboot to see this, or is it enough to unmount the
> filesystem?  How big is the ext4 filesystem, and how big was the
> amount of data that you rsync'ed?  One thing that would be worth
> trying if you can easily reproduce is whether it happens on a single
> device disk, or whether it only shows up when you use a /dev/mdX
> device.

I didn't reboot this time - I did last time. I just unmounted the file 
system and fsckd it. The filesystem is 8.2TB and the data is around 2.5TB.

The drives on a 3ware card, so I could configure the card as a single 
raid5 device and try to reproduce it there. It may take a day or two to 
copy the data if I try this.

Jeremy

-- 
Jeremy Sanders <jss@....cam.ac.uk>   http://www-xray.ast.cam.ac.uk/~jss/
X-Ray Group, Institute of Astronomy, University of Cambridge, UK.
Public Key Server PGP Key ID: E1AAE053
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ