[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <20080414234059.GM3106@webber.adilger.int>
Date: Mon, 14 Apr 2008 17:40:59 -0600
From: Andreas Dilger <adilger@....com>
To: Mingming Cao <cmm@...ibm.com>
Cc: Andi Kleen <andi@...stfloor.org>, linux-ext4@...r.kernel.org
Subject: Re: ext3_valid_block_bitmap: Invalid block bitmap in 2.6.25rc in memory
On Apr 14, 2008 07:50 -0700, Mingming Cao wrote:
> On Sat, 2008-04-12 at 22:57 +0200, Andi Kleen wrote:
> > FYI, a system here running various 2.6.25rc kernels (latest upto rc7-git6)
> > with longer uptimes suddenly decided to fsck one of its file systems
> > due to an error after reboot.
> >
> > The error causing this was:
> >
> > kernel: EXT3-fs error (device dm-0): ext3_valid_block_bitmap: Invalid block bitmap - block_group = 285, block = 9338882
> >
> > detected by the 2.6.25rc7-git6 kernel.
> >
> > I don't see any ill effects from it and fsck didn't find anything wrong
> > so it must have been something spurious in memory only (or fsck
> > fails to check for this condition, but that is hard to imagine)
>
> The ext3_valid_block_bitmap() is to check whether the block or inode
> bitmap block is marked as "used" in the block group bitmap, to prevent
> allocating blocks from these system meta data blocks.
Right.
> The error messages seems indicating that one of the block group meta
> data is corrupted, but I don't why fsck doesn't catch this, Andreas?
It might have been corrupted on read (e.g. bad cable, or bad/wrong
data read from disk the first time).
The message itself isn't very useful though. It should report what it
thinks is wrong with the bitmap (e.g. whether block/inode bitmaps are
unallocated, which/how many itable blocks are unallocated).
> Mingming
> > The system never showed anything like this on earlier kernel versions.
This is a new check, to catch allocation bitmap corruption before it
causes the corruption to spread into the rest of the filesystem by
double-allocating blocks, etc. Having a checksum would also be good,
but even then memory corruption can lead to a valid checksum of bad
data in memory so a validity check is still useful for such important
and rarely-read data.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists