linux-ext4 - Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100512150057.GA29867@atrey.karlin.mff.cuni.cz>
Date:	Wed, 12 May 2010 17:00:57 +0200
From:	Jan Kara <jack@...e.cz>
To:	Benjamin Herrenschmidt <benh@...nel.crashing.org>
Cc:	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
	Saeed Bishara <saeed@...vell.com>,
	Nicolas Pitre <nico@...vell.com>, linux-ext4@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	"James E.J. Bottomley" <jejb@...isc-linux.org>
Subject: Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell
	88f5182)

> On Tue, 2010-05-11 at 19:23 +1000, Benjamin Herrenschmidt wrote:
> 
> > Since I doubt ext3 is busted so dramatically in mainline for "normal" machines,
> > I tend to suspect things could be related to the infamous vivt caches. On the
> > other hand, it's pretty clearly metadata or journal corruption and I'm not
> > sure we ever do things that could cause aliases (such as vmap etc..) on
> > these things, and they shouldn't be mapped into userspace... unless it's fsck
> > itself that causes aliases to occur at the block device level ? (I do unmount
> > though before I run fsck).
> > 
> > On the other hand, it could also be a busticated marvell SATA driver :-)
> > 
> > I have no problem with the vendor kernel, but it's ancient (2.6.12) and based
> > on an out of tree variant of a Marvell originated BSP, so everything is
> > completely different, especially in the area of drivers for the chipset.
> > 
> > Anyways, I'll see if I can gather more data tomorrow as time, viruses and sick
> > kids permits.
> > 
> > In the meantime, any hint appreciated.
> 
> A quick other test which brings more infos, using a smaller (about 5GB)
> partition and no md or raid involved:
> 
>  - Boot with NFS root
>  - mkfs /dev/sdb2 (no md or raid involved)
>  - mount /dev/sdb2 /mnt/test
>  - rsync -avx /test-stuff /mnt/test
>  - cd /mnt/test
>  - md5sum -c ~/test-stuff-sums.txt
> 
> That gives me a whole bunch of:
> 
> md5sum: ./usr/bin/debconf-escape: No such file or directory
> ./usr/bin/debconf-escape: FAILED open or read
> ./usr/bin/stat: OK
> md5sum: ./usr/bin/chrt: No such file or directory
> ./usr/bin/chrt: FAILED open or read
  Could you get the filesystem image with: e2image -r /dev/sdb2 buggy-image
bzip2 it and make it available somewhere? Maybe I could guess something
from the way the filesystem gets corrupted.
  Oh, and also overwrite the partition with zeros before calling mkfs to make
the analysis simpler.

> In fact, if I do ls /mnt/test/usr/bin/ I see debconf but if I do
> ls /mnt/test/usr/bin/chrt then I get No such file or directory.
> 
> So something is badly wrong :-)
> 
> Now, trying without the dir_index feature (mkfs.ext3 -O ^dir_index)
> and it works fine. All my md5sum's are correct and fsck passes.
  Funny. Not sure how that could happen...

								Honza
-- 
Jan Kara <jack@...e.cz>
SuSE CR Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html