[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <x49ab1cxcma.fsf@segfault.boston.devel.redhat.com>
Date: Thu, 03 Sep 2009 11:38:05 -0400
From: Jeff Moyer <jmoyer@...hat.com>
To: Corrado Zoccolo <czoccolo@...il.com>
Cc: Linux-Kernel <linux-kernel@...r.kernel.org>,
Jens Axboe <jens.axboe@...cle.com>
Subject: Re: [RFC] cfq: adapt slice to number of processes doing I/O
Jeff Moyer <jmoyer@...hat.com> writes:
> Corrado Zoccolo <czoccolo@...il.com> writes:
>
>> When the number of processes performing I/O concurrently increases, a
>> fixed time slice per process will cause large latencies.
>> In the patch, if there are more than 3 processes performing concurrent
>> I/O, we scale the time slice down proportionally.
>> To safeguard sequential bandwidth, we impose a minimum time slice,
>> computed from cfq_slice_idle (the idea is that cfq_slice_idle
>> approximates the cost for a seek).
>>
>> I performed two tests, on a rotational disk:
>> * 32 concurrent processes performing random reads
>> ** the bandwidth is improved from 466KB/s to 477KB/s
>> ** the maximum latency is reduced from 7.667s to 1.728
>> * 32 concurrent processes performing sequential reads
>> ** the bandwidth is reduced from 28093KB/s to 24393KB/s
>> ** the maximum latency is reduced from 3.781s to 1.115s
>>
>> I expect numbers to be even better on SSDs, where the penalty to
>> disrupt sequential read is much less.
>
> Interesting approach. I'm not sure what the benefits will be on SSDs,
> as the idling logic is disabled for them (when nonrot is set and they
> support ncq). See cfq_arm_slice_timer.
>
>> Signed-off-by: Corrado Zoccolo <czoccolo@...il-com>
>>
>> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
>> index fd7080e..cff4ca8 100644
>> --- a/block/cfq-iosched.c
>> +++ b/block/cfq-iosched.c
>> @@ -306,7 +306,15 @@ cfq_prio_to_slice(struct cfq_data *cfqd, struct
>> cfq_queue *cfqq)
>> static inline void
>> cfq_set_prio_slice(struct cfq_data *cfqd, struct cfq_queue *cfqq)
>> {
>> - cfqq->slice_end = cfq_prio_to_slice(cfqd, cfqq) + jiffies;
>> + unsigned low_slice = cfqd->cfq_slice_idle * (1 + cfq_cfqq_sync(cfqq));
>> + unsigned interested_queues = cfq_class_rt(cfqq) ?
>> cfqd->busy_rt_queues : cfqd->busy_queues;
>
> Either my mailer displayed this wrong, or yours wraps lines.
>
>> + unsigned slice = cfq_prio_to_slice(cfqd, cfqq);
>> + if (interested_queues > 3) {
>> + slice *= 3;
>
> How did you come to this magic number of 3, both for the number of
> competing tasks and the multiplier for the slice time? Did you
> experiment with this number at all?
>
>> + slice /= interested_queues;
>
> Of course you realize this could disable the idling logic completely,
> right? I'll run this patch through some tests and let you know how it
> goes.
I missed that you updated the slice end based on a max of slice and
low_slice. Sorry about that.
This patch does not fare well when judging fairness between processes.
I have several fio jobs that generate read workloads, and I try to
figure out whether the I/O scheduler is providing fairness based on the
I/O priorities of the processes. With your patch applied, we get the
following results:
total priority: 880
total data transferred: 1045920
class prio ideal xferred %diff
be 0 213938 352500 64
be 1 190167 193012 1
be 2 166396 123380 -26
be 3 142625 86260 -40
be 4 118854 62964 -48
be 5 95083 40180 -58
be 6 71312 74484 4
be 7 47541 113140 137
Class and prio should be self-explanatory. ideal is my cooked up
version of the ideal number of bytes the given priority should have
transferred based on the total data transferred and all processes
weighted by priority competing for the disk. xferred is the actual
amount of data transferred, and %diff is the difference between those
last two columns.
Notice that best effort priority 7 managed to transfer more data than be
prio 3. That's bad. Now, let's look at 8 processes all at the same
priority level:
total priority: 800
total data transferred: 1071036
class prio ideal xferred %diff
be 4 133879 222452 66
be 4 133879 243188 81
be 4 133879 187380 39
be 4 133879 42512 -69
be 4 133879 39156 -71
be 4 133879 47604 -65
be 4 133879 37364 -73
be 4 133879 251380 87
Hmm. That doesn't look good.
For comparison, here is the output from the vanilla kernel for those two
runs:
total priority: 880
total data transferred: 954272
class prio ideal xferred %diff
be 0 195192 229108 17
be 1 173504 202740 16
be 2 151816 156660 3
be 3 130128 152052 16
be 4 108440 91636 -16
be 5 86752 64244 -26
be 6 65064 34292 -48
be 7 43376 23540 -46
total priority: 800
total data transferred: 887264
class prio ideal xferred %diff
be 4 110908 124404 12
be 4 110908 123380 11
be 4 110908 118004 6
be 4 110908 113396 2
be 4 110908 107252 -4
be 4 110908 98356 -12
be 4 110908 96244 -14
be 4 110908 106228 -5
It's worth noting that the overall throughput went up in the patched
kernel for this second case. However, if we care at all about the
notion of I/O priorities, I think your patch needs more work.
Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists