linux-ext4 - Re: [PATCH][18/28] e2fsprogs-tests-f_random

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <20080202084627.GS31694@webber.adilger.int>
Date:	Sat, 02 Feb 2008 01:46:27 -0700
From:	Andreas Dilger <adilger@....com>
To:	"Theodore Ts'o" <tytso@....edu>, linux-ext4@...r.kernel.org
Subject: Re: [PATCH][18/28] e2fsprogs-tests-f_random_corruption.patch

The f_random_corruption test enables a random subset of filesystem features,
picks one of the valid filesystem block and inode sizes, and a random device
size and creates a new filesystem with those parameters.

It is possible to disable the running of the test by setting the environment
variable F_RANDOM_CORRUPTION=skip.  By default the test script is run only
one time, but setting the LOOP_COUNT variable allows the test to run multiple
times.

If the script is running as root the filesystem is mounted and populated with
file data to allow a more useful test filesystem to be generated.  In some
cases the kernel may not support the requested filesystem features and the
filesystem cannot be mounted.  This is not considered a test failure.

The resulting filesystem is corrupted with both random data and by shifting
data from one part of the device to another and then repaired by e2fsck.
In some rare cases the random corruption is severe enough that the filesystem
is not recoverable (e.g. small filesystem with no backup superblock has bad
superblock corruption) but in most cases "e2fsck -fy" should be able to fix
all errors in some way.

After e2fsck has repaired the filesystem, it is optionally mounted (if the
environment variable MOUNT_AFTER_CORRUPTION=yes is set) and the test files
created in the filesystem are deleted.  This verifies that the fixes in the
filesystem are sufficient for the kernel to use the filesystem without error.
Since there is some possibility of the kernel oopsing if there is a filesystem
bug, this part of the test is not enabled by default.

Signed-off-by: Andreas Dilger <adilger@...sterfs.com>
Signed-off-by: Kalpak Shah <kalpak@...sterfs.com>

Index: e2fsprogs-1.40.4/tests/f_random_corruption/script
===================================================================
--- /dev/null
+++ e2fsprogs-1.40.4/tests/f_random_corruption/script
@@ -0,0 +1,277 @@
+# This is to make sure that if this test fails other tests can still be run
+# instead of doing an exit. We break before the end of the loop.
+export LOOP_COUNT=${LOOP_COUNT:-1}
+export COUNT=0
+
+while [ $COUNT -lt $LOOP_COUNT ]; do
+[ "$F_RANDOM_CORRUPTION" = "skip" ] && echo "skipped" && break
+
+# choose block and inode sizes randomly
+BLK_SIZES=(1024 2048 4096)
+INODE_SIZES=(128 256 512 1024)
+
+SEED=$(head -1 /dev/urandom | od -N 1 | awk '{ print $2 }')
+RANDOM=$SEED
+
+IMAGE=${IMAGE:-$TMPFILE}
+DATE=`date '+%Y%m%d%H%M%S'`
+ARCHIVE=$IMAGE.$DATE
+SIZE=${SIZE:-$(( 192000 + RANDOM + RANDOM )) }
+FS_TYPE=${FS_TYPE:-ext3}
+BLK_SIZE=${BLK_SIZES[(( $RANDOM % ${#BLK_SIZES[*]} ))]}
+INODE_SIZE=${INODE_SIZES[(( $RANDOM % ${#INODE_SIZES[*]} ))]}
+DEF_FEATURES="sparse_super,filetype,dir_index"
+FEATURES=${FEATURES:-$DEF_FEATURES}
+MOUNT_OPTS="-o loop"
+MNTPT=$test_dir/temp
+OUT=$test_name.log
+FAILED=$test_name.failed
+OKFILE=$test_name.ok
+
+# Do you want to try and mount the filesystem?
+MOUNT_AFTER_CORRUPTION=${MOUNT_AFTER_CORRUPTION:-"no"}
+# Do you want to remove the files from the mounted filesystem?
+# Ideally use it only in test environment.
+REMOVE_FILES=${REMOVE_FILES:-"no"}
+
+# In KB
+CORRUPTION_SIZE=${CORRUPTION_SIZE:-64}
+CORRUPTION_ITERATIONS=${CORRUPTION_ITERATIONS:-5}
+
+MKFS=../misc/mke2fs
+E2FSCK=../e2fsck/e2fsck
+FIRST_FSCK_OPTS="-fyv"
+SECOND_FSCK_OPTS="-fyv"
+
+# Lets check if the image can fit in the current filesystem.
+BASE_DIR=`dirname $IMAGE`
+BASE_AVAIL_BLOCKS=`df -P -k $BASE_DIR  | awk '/%/ { print $4 }'`
+
+if (( BASE_AVAIL_BLOCKS < NUM_BLKS * (BLK_SIZE / 1024) )); then
+	echo "$BASE_DIR does not have enough space to accomodate test image."
+	echo "Skipping test...."
+	break;
+fi
+
+# Lets have a journal more times than not.
+HAVE_JOURNAL=$((RANDOM % 12 ))
+if (( HAVE_JOURNAL == 0 )); then
+	FS_TYPE="ext2"
+	HAVE_JOURNAL=""
+else
+	HAVE_JOURNAL="-j"
+fi
+
+# Experimental features should not be used too often.
+LAZY_BG=$(( $RANDOM % 12 ))
+if (( LAZY_BG == 0 )); then
+	FEATURES=$FEATURES,lazy_bg
+fi
+ 
+# meta_bg and resize_inode features should not be enabled simultaneously
+META_BG=$(( $RANDOM % 12 ))
+if (( META_BG == 0 )); then
+	FEATURES=$FEATURES,meta_bg
+else
+	FEATURES=$FEATURES,resize_inode
+fi
+
+modprobe ext4 2> /dev/null
+modprobe ext4dev 2> /dev/null
+
+# If ext4 is present in the kernel then we can play with ext4 options
+EXT4=`grep ext4 /proc/filesystems`
+if [ -n "$EXT4" ]; then
+	USE_EXT4=$((RANDOM % 2 ))
+	if (( USE_EXT4 == 1 )); then
+		FS_TYPE="ext4dev"
+	fi
+fi
+
+if [ "$FS_TYPE" = "ext4dev" ]; then
+	UNINIT_GROUPS=$((RANDOM % 12 ))
+	if (( UNINIT_GROUPS == 0 )); then
+		FEATURES=$FEATURES,uninit_groups
+	fi
+	EXPAND_ESIZE=$((RANDOM % 12 ))
+	if (( EXPAND_EISIZE == 0 )); then
+		FIRST_FSCK_OPTS=$FIRST_FSCK_OPTS," -E expand_extra_isize"
+	fi
+fi
+
+MKFS_OPTS="$HAVE_JOURNAL -b $BLK_SIZE -I $INODE_SIZE -O $FEATURES"
+
+NUM_BLKS=$(((SIZE * 1024) / BLK_SIZE))
+
+log()
+{
+	[ "$VERBOSE" ] && echo "$*"
+	echo "$*" >> $OUT
+}
+
+error()
+{
+	log "$*"
+	echo "$*" >> $FAILED
+}
+
+unset_vars()
+{
+	unset IMAGE DATE ARCHIVE FS_TYPE SIZE BLK_SIZE MKFS_OPTS MOUNT_OPTS
+	unset E2FSCK FIRST_FSCK_OPTS SECOND_FSCK_OPTS OUT FAILED OKFILE
+}
+
+cleanup()
+{
+	[ "$1" ] && error "$*" || error "Error occured..."
+	umount -f $MNTPT > /dev/null 2>&1 | tee -a $OUT
+	cp $OUT $OUT.$DATE
+	echo " failed"
+	echo "*** This appears to be a bug in e2fsprogs ***"
+	echo "Please contact linux-ext4@...r.kernel.org for further assistance."
+	echo "Include $OUT.$DATE, and save $ARCHIVE locally for reference."
+	unset_vars
+	break;
+}
+
+echo -n "Random corruption test for e2fsck:"
+# Truncate the output log file
+rm -f $FAILED $OKFILE
+> $OUT
+
+get_random_location()
+{
+	total=$1
+
+	tmp=$(((RANDOM * 32768) % total))
+
+	# Try and have more corruption in metadata at the start of the
+	# filesystem.
+	if ((tmp % 3 == 0 || tmp % 5 == 0 || tmp % 7 == 0)); then
+		tmp=$((tmp % 32768))
+	fi
+
+	echo $tmp
+}
+
+make_fs_dirty()
+{
+	from=`get_random_location $NUM_BLKS`
+
+	# Number of blocks to write garbage into should be within fs and should
+	# not be too many.
+	num_blks_to_dirty=$((RANDOM % $1))
+
+	# write garbage into the selected blocks
+	[ ! -c /dev/urandom ] && return
+	log "writing ${num_blks_to_dirty}kB random garbage at offset ${from}kB"
+	dd if=/dev/urandom of=$IMAGE bs=1kB seek=$from conv=notrunc \
+		count=$num_blks_to_dirty >> $OUT 2>&1
+}
+
+
+touch $IMAGE
+log "Format the filesystem image..."
+log
+# Write some garbage blocks into the filesystem to make sure e2fsck has to do
+# a more difficult job than checking blocks of zeroes.
+log "Copy some random data into filesystem image...."
+make_fs_dirty 32768
+log "$MKFS $MKFS_OPTS -F $IMAGE $NUM_BLKS >> $OUT"
+$MKFS $MKFS_OPTS -F $IMAGE $NUM_BLKS >> $OUT 2>&1
+if [ $? -ne 0 ]; then
+	zero_size=`grep "Device size reported to be zero" $OUT`
+	short_write=`grep "Attempt to write block from filesystem resulted in short write" $OUT`
+
+	if (( zero_size != 0 || short_write != 0 )); then
+		echo "mkfs failed due to device size of 0 or a short write. This is harmless and need not be reported."
+	else
+		cleanup "mkfs failed - internal error during operation. Aborting random regression test..."
+	fi
+fi
+
+if [ `id -u` = 0 ]; then
+	mkdir -p $MNTPT
+	if [ $? -ne 0 ]; then
+		log "Failed to create or find mountpoint...."
+	else
+		mount -t $FS_TYPE $MOUNT_OPTS $IMAGE $MNTPT 2>&1 | tee -a $OUT
+		if [ $? -ne 0 ]; then
+			log "Unable to mount file system - skipped"
+		else
+			df -h $MNTPT >> $OUT
+			df -i $MNTPT >> $OUT
+			log "Copying data into the test filesystem..."
+
+			cp -r ../ $MNTPT >> $OUT 2>&1
+			sync
+			umount -f $MNTPT > /dev/null 2>&1 | tee -a $OUT
+		fi
+	fi
+else
+	log "skipping mount test for non-root user"
+fi
+
+log "Corrupt the image by moving around blocks of data..."
+log
+for (( i = 0; i < $CORRUPTION_ITERATIONS; i++ )); do
+	from=`get_random_location $NUM_BLKS`
+	to=`get_random_location $NUM_BLKS`
+
+	log "Moving ${CORRUPTION_SIZE}kB from block ${from}kB to ${to}kB"
+	dd if=$IMAGE of=$IMAGE bs=1k count=$CORRUPTION_SIZE conv=notrunc skip=$from seek=$to >> $OUT 2>&1
+
+	# more corruption by overwriting blocks from within the filesystem.
+	make_fs_dirty $CORRUPTION_SIZE
+done
+
+# Copy the image for reproducing the bug.
+cp --sparse=always $IMAGE $ARCHIVE >> $OUT 2>&1
+
+log "First pass of fsck..."
+$E2FSCK $FIRST_FSCK_OPTS $IMAGE >> $OUT 2>&1
+RET=$?
+
+# Run e2fsck for the second time and check if the problem gets solved.
+# After we can report error with pass1.
+[ $((RET & 1)) == 0 ] || log "The first fsck corrected errors"
+[ $((RET & 2)) == 0 ] || error "The first fsck wants a reboot"
+[ $((RET & 4)) == 0 ] || error "The first fsck left uncorrected errors"
+[ $((RET & 8)) == 0 ] || error "The first fsck reports an operational error"
+[ $((RET & 16)) == 0 ] || error "The first fsck reports there was a usage error"
+[ $((RET & 32)) == 0 ] || error "The first fsck reports it was cancelled"
+[ $((RET & 128)) == 0 ] || error "The first fsck reports a library error"
+
+log "---------------------------------------------------------"
+
+log "Second pass of fsck..."
+$E2FSCK $SECOND_FSCK_OPTS $IMAGE >> $OUT 2>&1
+RET=$?
+[ $((RET & 1)) == 0 ] || cleanup "The second fsck corrected errors!"
+[ $((RET & 2)) == 0 ] || cleanup "The second fsck wants a reboot"
+[ $((RET & 4)) == 0 ] || cleanup "The second fsck left uncorrected errors"
+[ $((RET & 8)) == 0 ] || cleanup "The second fsck reports an operational error"
+[ $((RET & 16)) == 0 ] || cleanup "The second fsck reports a usage error"
+[ $((RET & 32)) == 0 ] || cleanup "The second fsck reports it was cancelled"
+[ $((RET & 128)) == 0 ] || cleanup "The second fsck reports a library error"
+
+[ -f $FAILED ] && cleanup
+
+if [ "$MOUNT_AFTER_CORRUPTION" = "yes" ]; then
+	mount -t $FS_TYPE $MOUNT_OPTS $IMAGE $MNTPT 2>&1 | tee -a $OUT
+	[ $? -ne 0 ] && log "Unable to mount file system - skipped"
+
+	[ "$REMOVE_FILES" = "yes" ] && rm -rf $MNTPT/* >> $OUT
+	umount -f $MNTPT > /dev/null 2>&1 | tee -a $OUT
+fi
+
+rm -f $ARCHIVE
+
+# Report success
+echo " ok"
+echo "Succeeded..." > $OKFILE
+
+unset_vars
+
+COUNT=$((COUNT + 1))
+done
Index: e2fsprogs-1.40.4/tests/Makefile.in
===================================================================
--- e2fsprogs-1.40.4.orig/tests/Makefile.in
+++ e2fsprogs-1.40.4/tests/Makefile.in
@@ -24,6 +24,8 @@ test_script: test_script.in Makefile
 	@chmod +x test_script
 
 check:: test_script
+	@echo "Removing remnants of earlier tests..."
+	$(RM) -f *~ *.log *.new *.failed *.ok test.img2*
 	@echo "Running e2fsprogs test suite..."
 	@echo " "
 	@./test_script
@@ -63,7 +65,7 @@ testend: test_script ${TDIR}/image
 	@echo "If all is well, edit ${TDIR}/name and rename ${TDIR}."
 
 clean::
-	$(RM) -f *~ *.log *.new *.failed *.ok test.img test_script
+	$(RM) -f *~ *.log *.new *.failed *.ok test.img* test_script
 
 distclean:: clean
 	$(RM) -f Makefile

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html