lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 19 Oct 2016 16:32:33 -0400
From:   Theodore Ts'o <tytso@....edu>
To:     Jens Axboe <axboe@...com>
Cc:     Dave Chinner <david@...morbit.com>, linux-ext4@...r.kernel.org,
        fstests@...r.kernel.org, tarasov@...ily.name
Subject: Re: Test generic/299 stalling forever

On Wed, Oct 19, 2016 at 11:49:12AM -0600, Jens Axboe wrote:
> 
> Number of cores/nodes?
> Memory size?

I'm using a gce n1-standard-2 VM.  So that's two CPU's and 7680M.

Each CPU is a virtual CPU is implemented as a single hardware
hyper-thread on a 2.3 GHz Intel Xeon E5 v3 (Haswell).  (I was using a
GCE zone that has Haswell processors; different GCE zones may have
different processors.  See [1] and [2] for more details.)

[1] https://cloud.google.com/compute/docs/machine-types
[2] https://cloud.google.com/compute/docs/regions-zones/regions-zones

> Rough speed and size of the device?

I'm using a GCE PD backed by a SSD.  To a first approximation, you can
think of it as a KVM qcow file stored on a fast flash device.  I'm
running LVM on the disk, and the fio is running on a 5 gig LVM volume.

> Any special mkfs options?

No.  This particular error will trigger on 4k block file systems, 1k
block file systems, 4k file system swith journals disabled, etc.  It's
fairly insensitive to the file system configuration.

> And whatever else might be relevant.

Note that the generic/299 test is running fio in an an ENOSPC hitter
configuration, where there is an antangonist thread which is
constantly allocating all of the disk space available and then freeing
it all:

# FSQA Test No. 299
#
# AIO/DIO stress test
# Run random AIO/DIO activity and fallocate/truncate simultaneously
# Test will operate on huge sparsed files so ENOSPC is expected.


So some of the AIO/DIO operations will be failing with an error, and
and I suspect that's very likely relevant to reproducing the failure.

The actual guts of the test from generic/299[1]:

[1] https://git.kernel.org/cgit/fs/xfs/xfstests-dev.git/tree/tests/generic/299

_workout()
{
	echo ""
	echo "Run fio with random aio-dio pattern"
	echo ""
	cat $fio_config >>  $seqres.full
	run_check $FIO_PROG $fio_config &
	pid=$!
	echo "Start fallocate/truncate loop"

	for ((i=0; ; i++))
	do
	    for ((k=1; k <= NUM_JOBS; k++))
	    do
		$XFS_IO_PROG -f -c "falloc 0 $FILE_SIZE" \
			$SCRATCH_MNT/direct_aio.$k.0 >> $seqres.full 2>&1
	    done
	    for ((k=1; k <= NUM_JOBS; k++))
	    do
		$XFS_IO_PROG -c "truncate  0" \
			$SCRATCH_MNT/direct_aio.$k.0 >> $seqres.full 2>&1
	    done
	    # Following like will check that pid is still run.
	    # Once fio exit we can stop fallocate/truncate loop
	    pgrep -f "$FIO_PROG" > /dev/null 2>&1 || break
	done
	wait $pid
}

So what's happening is that generic/299 is looping in the
fallocate/truncate loop until fio exits, but since fio never exits, so
it ends up looping forever.

Cheers,

					- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ