lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <151D812A-0304-470C-AC67-B7A198408459@dilger.ca>
Date:	Tue, 6 May 2014 16:38:36 -0600
From:	Andreas Dilger <adilger@...ger.ca>
To:	Devrin Talen <dct23@...nell.edu>
Cc:	Theodore Ts'o <tytso@....edu>,
	Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: ext4 filesystem corruption across partitions

On May 6, 2014, at 1:40 PM, Theodore Ts'o <tytso@....edu> wrote:
> On Mon, May 05, 2014 at 10:01:30PM -0400, Devrin Talen wrote:
>> 2. Write data to partition 12 via ADB (using `adb push ... /cache/`)
> 
> Instead of using ADB, I would suggest writing a test program which
> writes a series of 512 byte sectors to a single large file in /cache.
> At the beginning of each 512 byte sector include a 4 byte serial
> number (which is incremented by one for each sector), a 4 byte testID
> which is different for each run of your test program, a time stamp, a
> CRC of these fields, and then fill the rest of the sector with some
> text string to make it easy to recognize this pattern.  It can be
> anything from 0xDEADBEEF, to a string such as "DEBUGGING RANDOM HW
> BUGS REALLY SUCKS".  :-)

We wrote a tool "llverfs" to do this years ago, for debugging problems
with >16TB LUN sizes and other 64/32-bit address truncation problems:

http://git.hpdd.intel.com/?p=fs/lustre-release.git;a=blob;f=lustre/utils/llverfs.c

It either partially fills the filesystem (write all files then read all
files, with one write per MB) to do a fast test of the system or can
optionally completely fill the filesystem and writes to every 4kB block
and then reads it back and verifies the data.

Each block contains the inode number, block offset, and a timestamp
(to distinguish between separate runs) so that it can detect where
badly written data is coming from.

There is a companion tool for doing block-device testing

http://git.hpdd.intel.com/?p=fs/lustre-release.git;a=blob;f=lustre/utils/llverdev.c

Caveat - we only ever use the llverfs on disposable filesystems, and
while I don't _think_ it will clobber the other files that already
exist, I've never tested it in such a manner.  Obviously, llverdev is
overwriting the whole block device, so it will erase all data in the
device it is pointed at.

Cheers, Andreas

> Now try to reproduce the problem with this write load.  If you can
> reproduce the problem, check and see if the corrupted file system
> block in the shows evidence of the string that was supposed to be
> written into /cache, showing up in /data.  You can also check the
> large file being written in the /cache has the expended serial number
> and checksum.
> 
> This will allow you to see if a the block writes are just going to the
> wrong place on the SSD, or something else more strange might be going
> on.  Depending on the pattern of what blocks are ending up where they
> shouldn't, it might point towards different possible causes (i.e., a
> flaky solder joint, a buggy flash translation layer in the eMMC chip,
> etc.)
> 
> Cheers,
> 
> 					- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists