linux-ext4 - buffered writeback torture program

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1303322378-sup-1722@think>
Date:	Wed, 20 Apr 2011 14:23:29 -0400
From:	Chris Mason <chris.mason@...cle.com>
To:	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	linux-ext4 <linux-ext4@...r.kernel.org>, xfs@....sgi.com,
	jack@...e.cz
Cc:	axboe@...nel.dk
Subject: buffered writeback torture program

Hi everyone,

I dug out my old fsync latency tester to make sure Jens' new plugging
code hadn't caused regressions.  This started off as a program Ted wrote
during the firefox dark times, and I added some more code to saturate
spindles with random IO.

The basic idea is:

1) make a nice big sequential 8GB file
2) fork a process doing random buffered writes inside that file
3) overwrite a second 4K file in a loop, doing fsyncs as you go.

The test program times how long each write and fsync take in step three.
The idea is that if we have problems with concurrent buffered writes and
fsyncs, then all of our fsyncs will get stuck behind the random IO
writeback and our latencies will be high.

For a while, xfs, btrfs and ext4 did really well at this test.  Our
fsync latencies were very small and we all sent down synchronous IO that
the elevator dispatched straight to the drive.

Things have changed though, both xfs and ext4 have grown code to do
dramatically more IO than write_cache_pages has asked for (I'm pretty
sure I told everyone this was a good idea at the time).  When doing
sequential writes, this is a great idea.  When doing random IO, it leads
to unbound stalls in balance_dirty_pages.

Here's an example run on xfs:

# fsync-tester
setting up random write file
done setting up random write file
starting fsync run
starting random io!
write time 0.0009s fsync time: 2.0142s
write time 128.9305s fsync time: 2.6046s
run done 2 fsyncs total, killing random writer

In this case the 128s spent in write was on a single 4K overwrite on a
4K file.

ext4 is doing a blanket increase on wbc->nr_to_write, while xfs is
pushing down any other IO inside the same extent as the writepage it
just serviced.  The test program ends up being a worst case for both of
them.

I honestly don't know what the right answer is here, but I'd say we
should be backing off our nr_to_write bumps if we notice a lot of random
IO.

Btrfs does bump wbc->nr_to_write as well, but only as far as we have
sequential IO to process.  We also end up throttling the random IO
harder because it is a worst case for our delayed allocation reservation
code.

The test program is below, and can also be found at:

http://oss.oracle.com/~mason/fsync-tester.c

Usage: gcc -Wall -o fsync-tester fsync-tester.c

mkfs.your_favorite /dev/xxx
mount /dev/xxx /mnt
cd /mnt
fsync-tester

It creates a single 8GB file and one 4K file, so your FS must be at
least that big.

-chris

/*
 * fsync-tester.c
 *
 * Written by Theodore Ts'o, 3/21/09.  Updated by Chris Mason to include
 * the random writer thread
 *
 * This file may be redistributed under the terms of the GNU Public
 * License, version 2.
 */
#define _FILE_OFFSET_BITS 64
#define _XOPEN_SOURCE 500
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <sys/wait.h>
#include <signal.h>
#include <time.h>
#include <fcntl.h>
#include <string.h>

#define SIZE (32768*32)
static char bigbuf[4096];

static float timeval_subtract(struct timeval *tv1, struct timeval *tv2)
{
	return ((tv1->tv_sec - tv2->tv_sec) +
		((float) (tv1->tv_usec - tv2->tv_usec)) / 1000000);
}

static void random_io(int fd, loff_t total)
{
	loff_t cur = 0;
	int ret;

	/* just some constant so our runs are always the same */
	srand(4096);
	while(1) {
		/*
		 * we want a random offset into the file,
		 * but rand only returns max in.  So we make
		 * it a random block number instead, and multiply
		 * by 4096.
		 */
		cur = rand();
		cur = (cur * 4096) % (total - 4096);

		/* align our offset  to 4k */
		cur = cur / 4096;
		cur = cur * 4096;
		ret = pwrite(fd, bigbuf, 4096, cur);
		if (ret < 4096) {
			fprintf(stderr, "short write ret %d cur %llu\n",
				ret, (unsigned long long)cur);
			exit(1);
		}
	}
}

int main(int argc, char **argv)
{
	int	fd;
	struct timeval tv, tv2, start;
	char buf[SIZE];
	pid_t pid;
	loff_t total = ((loff_t)8) * 1024 * 1024 * 1024;
	loff_t cur = 0;
	int rand_fd;
	int ret;
	int i;
	int status;
	struct stat st;

	memset(bigbuf, 0, 4096);

	rand_fd = open("fsync-tester.rnd-file", O_WRONLY|O_CREAT);
	if (rand_fd < 0) {
		perror("open");
		exit(1);
	}

	ret = fstat(rand_fd, &st);
	if (ret < 0) {
		perror("fstat");
		exit(1);
	}

	if (st.st_size < total) {
		printf("setting up random write file\n");
		while(cur < total) {
			ret = write(rand_fd, bigbuf, 4096);
			if (ret <= 0) {
				fprintf(stderr, "short write\n");
				exit(1);
			}
			cur += ret;
		}
		printf("done setting up random write file\n");
	}

	fd = open("fsync-tester.tst-file", O_WRONLY|O_CREAT);
	if (fd < 0) {
		perror("open");
		exit(1);
	}
	memset(buf, 'a', SIZE);

	pid = fork();
	if (pid == 0) {
		printf("starting random io!\n");
		random_io(rand_fd, total);
		exit(0);
	}

	close(rand_fd);

	gettimeofday(&start, NULL);
	printf("starting fsync run\n");
	for(i = 0; i < 60; i++) {
		float pwrite_time;

		gettimeofday(&tv2, NULL);
		pwrite(fd, buf, SIZE, 0);
		gettimeofday(&tv, NULL);

		pwrite_time = timeval_subtract(&tv, &tv2);

		fsync(fd);
		gettimeofday(&tv2, NULL);

		printf("write time: %5.4fs fsync time: %5.4fs\n", pwrite_time,
		       timeval_subtract(&tv2, &tv));

		if (timeval_subtract(&tv2, &start) > 60)
			break;
		sleep(4);
	}
	printf("run done %d fsyncs total, killing random writer\n", i + 1);
	fflush(stdout);
	kill(pid, SIGTERM);
	wait(&status);

	return 0;
}



--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html