linux-kernel - Re: Read starvation by sync writes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 11 Dec 2012 16:44:15 -0500
From:	Jeff Moyer <jmoyer@...hat.com>
To:	Jan Kara <jack@...e.cz>
Cc:	linux-fsdevel@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
	Jens Axboe <axboe@...nel.dk>
Subject: Re: Read starvation by sync writes

Jan Kara <jack@...e.cz> writes:

>   Hi,
>
>   I was looking into IO starvation problems where streaming sync writes (in
> my case from kjournald but DIO would look the same) starve reads. This is
> because reads happen in small chunks and until a request completes we don't
> start reading further (reader reads lots of small files) while writers have
> plenty of big requests to submit. Both processes end up fighting for IO
> requests and writer writes nr_batching 512 KB requests while reader reads
> just one 4 KB request or so. Here the effect is magnified by the fact that
> the drive has relatively big queue depth so it usually takes longer than
> BLK_BATCH_TIME to complete the read request. The net result is it takes
> close to two minutes to read files that can be read under a second without
> writer load. Without the big drive's queue depth, results are not ideal but
> they are bearable - it takes about 20 seconds to do the reading. And for
> comparison, when writer and reader are not competing for IO requests (as it
> happens when writes are submitted as async), it takes about 2 seconds to
> complete reading.
>
> Simple reproducer is:
>
> echo 3 >/proc/sys/vm/drop_caches
> dd if=/dev/zero of=/tmp/f bs=1M count=10000 &
> sleep 30
> time cat /etc/* 2>&1 >/dev/null
> killall dd
> rm /tmp/f

This is a buffered writer.  How does it end up that you are doing all
synchronous write I/O?  Also, you forgot to mention what file system you
were using, and which I/O scheduler.

Is this happening in some real workload?  If so, can you share what that
workload is?  How about some blktrace data?

>   The question is how can we fix this? Two quick hacks that come to my mind
> are remove timeout from the batching logic (is it that important?) or
> further separate request allocation logic so that reads have their own
> request pool. More systematic fix would be to change request allocation
> logic to always allow at least a fixed number of requests per IOC. What do
> people think about this?

There has been talk of removing the limit on the number of requests
allocated, but I haven't seen patches for it, and I certainly am not
convinced of its practicality.  Today, when using block cgroups you do
get a request list per cgroup, so that's kind of the same thing as one
per ioc.  I can certainly see moving in that direction for the
non-cgroup case.

First, though, I'd like to better understand your workload.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/