lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <18249.1249591034@alphaville.usa.hp.com>
Date:	Thu, 06 Aug 2009 16:37:14 -0400
From:	Nick Dokos <nicholas.dokos@...com>
To:	Valerie Aurora <vaurora@...hat.com>
Cc:	Nick Dokos <nicholas.dokos@...com>, linux-ext4@...r.kernel.org
Subject: Re: ll_ver_fs data verification failure - 96TB fs 

Valerie Aurora <vaurora@...hat.com> wrote:

> On Mon, Aug 03, 2009 at 09:54:36AM -0400, Nick Dokos wrote:
> > Just a heads-up for now. I ran ll_ver_fs on a 96TB fs - the write phase
> > finished without problems, but the read phase encountered a problem:
> > 
> > ...
> > read File name: /mnt/dir00373/file026
> > 
> > liverfs: verify /mnt/dir00373/file026 failed offset/timestamp/inode 3244298240/1248819541/1096796: found 3243249664/1248819541/1096796 instead
> > 
> > liverfs: Data verification failed
> > 770.45user 218639.65system 67:38:18elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
> > 100357573552inputs+195522668184outputs (1major+414minor)pagefaults 0swaps
> > make: *** [llver] Error 2
> > 
> > 
> > The offset difference is exactly 1M, and it occurs about 3GB into the file.
> 
> Interesting - exactly 1M off.  Does this correspond to anything
> interesting in extent layout or block allocation boundaries?
> 
> Any chance you can patch ll_ver_fs to continue after the first error?
> I'd be happy to write the patch for you.

I did that to begin with but the problem turns out to be much more
mundane: there was an IO error on one of the volumes. It wasn't quite
obvious (no red lights going off) but there *was* a message in
/var/log/messages - unfortunately I missed it. I eventually recreated
the error by trying to read the file with ``od -c'' and then went back
and found the original error. I don't know why/how ll_ver_fs managed to
read the offset and come up with a 1M difference[1] -- ``od -c'' failed with
a big thud.

We have now replaced the disk and I'm doing the test again: it should be
done (barring further problems) by sometime next week.

> 
> > In total, there are 726 directories, each with 32 4GB files (except the last,
> > which only has 12 files). So directory 373 is roughly half-way. I'll take a look
> > at the block allocation of both the directory and the file and see if they are
> > straddling the 16TB boundary (or other such).
> 
> Did you have a chance to look at what falls before and after the 16TB
> boundary?
> 

I did go barking up the wrong tree for a while :-) (or should that be :-( ?)

Thanks,
Nick

[1] that's a 2-bit flip:

3244298240 = 2#11000001011000000001000000000000
3243249664 = 2#11000001010100000001000000000000
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ