lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Fri, 24 Apr 2009 01:14:14 +0200
From:	Sylvain Rochet <gradator@...dator.net>
To:	Theodore Tso <tytso@....edu>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-ext4@...r.kernel.org, linux-nfs@...r.kernel.org
Subject: Re: Fw: 2.6.28.9: EXT3/NFS inodes corruption

Hi,


On Wed, Apr 22, 2009 at 08:11:39PM -0400, Theodore Tso wrote:
> 
> On the server side, that means you also an inode table block look
> corrupted.  I'm pretty sure that if you used debugfs to examine those
> blocks you would have seen that the inodes were completely garbaged.

Yep, I destroyed all evidences by using badblocks in read-write mode, 
but in case of real need of them we just have to put the production back 
on the primary array and wait a few days.


> Depending on the inode size, and assuming a 4k block size, there are
> typically 128 or 64 inodes in a 4k block,

4k block size
128 bytes/inode

so 32 inodes per 4k block in our case ?

Since the new default is 256 bytes/inode and values of less than 128 are 
not allowed, how is it possible to store 64 or 128 inodes in a 4k block ?
(Maybe I miss something :p)


> so if you were to look at the inodes by inode number, you normally 
> find that adjacent inodes are corrupted within a 4k block.  Of course, 
> this just tells us what had gotten damaged; whether it was damanged by 
> a kernel bug, a memory bug, a hard drive or controller failure (and 
> there are multiple types of storage stack failures; complete garbage 
> getting written into the right place, and the right data getting 
> written into the wrong place).

Yes, this is not going to be easy to find out what is responsible, but 
based on the probability of hardware that use to fail easily, let's 
point out one of the harddrive :-)


> Well, sure, but any amount of corruption is extremely troubling....

Yep ;-)


> > By the way, if such corruptions doesn't happen on the backup storage 
> > array we can conclude to a hardware problem around the primary one, but, 
> > we are not going to be able to conclude before a few weeks.
> 
> Good luck!!

Thanks, actually this isn't so bad, we enjoy having backup hardware
(The things we always consider as useless until we -really- need it -- 
"Who said like backups ? I heard it from the end of the room." ;-)

By the way, the badblocks check is going to take 12 days considering the 
current rate. However I ran some data checks of the raid6 array in the 
past, mainly when the filesystem was corrupted and every check 
succeeded. Maybe the raid6 driver computed another parity strides by 
reading corrupted data.


Sylvain

Download attachment "signature.asc" of type "application/pgp-signature" (190 bytes)

Powered by blists - more mailing lists