linux-kernel - Re: XFS filesystem corruption on the arm(el) architecture

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <48F89E0F.6030307@sandeen.net>
Date:	Fri, 17 Oct 2008 09:15:43 -0500
From:	Eric Sandeen <sandeen@...deen.net>
To:	Martin Michlmayr <tbm@...ius.com>
CC:	Tobias Frost <tobi@...dtobi.de>, linux-kernel@...r.kernel.org,
	debian-arm@...ts.debian.org, xfs@....sgi.com
Subject: Re: XFS filesystem corruption on the arm(el) architecture

Martin Michlmayr wrote:
> * Eric Sandeen <sandeen@...deen.net> [2008-10-16 17:13]:
>> So is this a regression?  did it used to work?  If so, when? :)
> 
> The original report was with 2.6.18 but that was with the old ABI:
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=423562
> I just installed a 2.6.22 kernel with EABI and I can also trigger
> the bug.  So it's not a (recent) regression.
> 
>> What's a little odd is that the buffer it dumped out looks like the
>> beginning of a perfectly valid superblock for your filesystem
>> (magic, block size, and block count all match).   If you printk the
>> "bno" variable right around line 2106 in xfs_da_btree.c, can you see
>> what you get?
> 
> bno is 0.

Ok, that's a little odd.  (correlates with the "bad" magic that was
seen, because block 0 is the superblock, but doesn't make sense because
we were trying to read a directory leaf block, in theory)

If you unmount & remount, does the ls work then?

>> creating an xfs_metadump of the filesystem for examination on a
>> non-arm box might also be interesting.
> 
> http://www.cyrius.com/tmp/dump5
> (11 MB)

Thanks.

xfs_repair on x86 shows no errors; however it won't mount normally (bad
log clientid) - but mount -o norecovery,ro and subsequent ls works fine
(at first I thought filenames were badly scrambled but then remembered
that xfs_metadump does this by default ;))

The remaining problem that I know of on some arm architectures is a vmap
cache aliasing problem that usually shows up as log corruption; that may
explain the bad clientid thing but not sure why we're reading block 0 above.

Do you know what cachepolicy you're booted with?  If it's writeallocate,
you might try cachepolicy=writeback, otherwise try cachepolicy=uncached
(which will be horribly slow) and see if the problem goes away or not;
it'd be a clue.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/