linux-kernel - Re: Read starvation by sync writes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 13 Dec 2012 09:43:31 +0800
From:	Shaohua Li <shli@...nel.org>
To:	Jan Kara <jack@...e.cz>
Cc:	linux-fsdevel@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
	Jens Axboe <axboe@...nel.dk>
Subject: Re: Read starvation by sync writes

2012/12/12 Jan Kara <jack@...e.cz>:
> On Wed 12-12-12 10:55:15, Shaohua Li wrote:
>> 2012/12/11 Jan Kara <jack@...e.cz>:
>> >   Hi,
>> >
>> >   I was looking into IO starvation problems where streaming sync writes (in
>> > my case from kjournald but DIO would look the same) starve reads. This is
>> > because reads happen in small chunks and until a request completes we don't
>> > start reading further (reader reads lots of small files) while writers have
>> > plenty of big requests to submit. Both processes end up fighting for IO
>> > requests and writer writes nr_batching 512 KB requests while reader reads
>> > just one 4 KB request or so. Here the effect is magnified by the fact that
>> > the drive has relatively big queue depth so it usually takes longer than
>> > BLK_BATCH_TIME to complete the read request. The net result is it takes
>> > close to two minutes to read files that can be read under a second without
>> > writer load. Without the big drive's queue depth, results are not ideal but
>> > they are bearable - it takes about 20 seconds to do the reading. And for
>> > comparison, when writer and reader are not competing for IO requests (as it
>> > happens when writes are submitted as async), it takes about 2 seconds to
>> > complete reading.
>> >
>> > Simple reproducer is:
>> >
>> > echo 3 >/proc/sys/vm/drop_caches
>> > dd if=/dev/zero of=/tmp/f bs=1M count=10000 &
>> > sleep 30
>> > time cat /etc/* 2>&1 >/dev/null
>> > killall dd
>> > rm /tmp/f
>> >
>> >   The question is how can we fix this? Two quick hacks that come to my mind
>> > are remove timeout from the batching logic (is it that important?) or
>> > further separate request allocation logic so that reads have their own
>> > request pool. More systematic fix would be to change request allocation
>> > logic to always allow at least a fixed number of requests per IOC. What do
>> > people think about this?
>>
>> As long as queue depth > workload iodepth, there is little we can do
>> to prioritize tasks/IOC. Because throttling a task/IOC means queue
>> will be idle. We don't want to idle a queue (especially for SSD), so
>> we always push as more requests as possible to the queue, which
>> will break any prioritization. As far as I know we always have such
>> issue in CFQ for big queue depth disk.
>   Yes, I understand that. But actually big queue depth on its own doesn't
> make the problem really bad (at least for me). When the reader doesn't have
> to wait for free IO requests, it progresses at a reasonable speed. What
> makes it really bad is that big queue depth effectively disallows any use
> of ioc_batching() mode for the reader and thus it blocks in request
> allocation for every single read request unlike writer which always uses
> its full batch (32 requests).

This can't explain why setting queue depth 1 makes the performance
better. In that case, write still get that number of requests, read will
wait for a request. Anyway, try setting nr_request to a big number
and check if performance is different.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/