linux-kernel - Re: [Fwd: [RFC] cfq: adapt slice to number of processes doing I/O (v2.1)]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ABB3A0E.2090204@cn.fujitsu.com>
Date:	Thu, 24 Sep 2009 17:21:18 +0800
From:	Shan Wei <shanwei@...fujitsu.com>
To:	czoccolo@...il.com
CC:	Jens Axboe <jens.axboe@...cle.com>, Jeff Moyer <jmoyer@...hat.com>,
	linux-kernel@...r.kernel.org, Shan Wei <shanwei@...fujitsu.com>
Subject: Re: [Fwd: [RFC] cfq: adapt slice to number of processes doing I/O
 (v2.1)]

> Subject:
> [RFC] cfq: adapt slice to number of processes doing I/O (v2.1)
>
> When the number of processes performing I/O concurrently increases,
> a fixed time slice per process will cause large latencies.
> 
> This (v2.1) patch will scale the time slice assigned to each process,
> according to a target latency (tunable from sysfs, default 300ms).
> 
> In order to keep fairness among processes, we adopt two devices, w.r.t. v1.
> 
> * The number of active processes is computed using a special form of
> running average, that quickly follows sudden increases (to keep latency low),
> and decrease slowly (to have fairness in spite of rapid decreases of this 
> value).
> 
> * The idle time is computed using remaining slice as a maximum.
> 
> To safeguard sequential bandwidth, we impose a minimum time slice
> (computed using 2*cfq_slice_idle as base, adjusted according to priority
> and async-ness).
> 
> Signed-off-by: Corrado Zoccolo <czoccolo@...il.com>
> 

I'm interested in the idea of dynamically tuning the time slice according to the number of
processes. I have tested your patch using Jeff's tool on kernel-2.6.30-rc4. 
>From the following test result, after applying your patch, the fairness is not good 
as original kernel, e.g. io priority of 4 vs 5 in be0-through-7.fio case. 
And the throughout(total data transferred) becomes lower.
Have you tested buffered write, multi-threads?

Additionally i have a question about the minimum time slice, see the comment in your patch. 

*Original*(2.6.30-rc4 without patch):
/cfq-regression-tests/2.6.30-rc4-log/be0-through-7.fio 
total priority: 880
total data transferred: 535872
class	prio	ideal	xferred	%diff
be	0	109610	149748	36
be	1	97431	104436	7
be	2	85252	91124	6
be	3	73073	64244	-13
be	4	60894	59028	-4
be	5	48715	38132	-22
be	6	36536	21492	-42
be	7	24357	7668	-69

/cfq-regression-tests/2.6.30-rc4-log/be0-vs-be1.fio 
total priority: 340
total data transferred: 556008
class	prio	ideal	xferred	%diff
be	0	294357	402164	36
be	1	261650	153844	-42

/cfq-regression-tests/2.6.30-rc4-log/be0-vs-be7.fio 
total priority: 220
total data transferred: 537064
class	prio	ideal	xferred	%diff
be	0	439416	466164	6
be	7	97648	70900	-28

/cfq-regression-tests/2.6.30-rc4-log/be4-x-3.fio 
total priority: 300
total data transferred: 532964
class	prio	ideal	xferred	%diff
be	4	177654	199260	12
be	4	177654	149748	-16
be	4	177654	183956	3

/cfq-regression-tests/2.6.30-rc4-log/be4-x-8.fio 
total priority: 800
total data transferred: 516384
class	prio	ideal	xferred	%diff
be	4	64548	78580	21
be	4	64548	76436	18
be	4	64548	75764	17
be	4	64548	70900	9
be	4	64548	42388	-35
be	4	64548	73780	14
be	4	64548	30708	-53
be	4	64548	67828	5


*Applied patch*(2.6.30-rc4 with patch):

/cfq-regression-tests/log-result/be0-through-7.fio 
total priority: 880
total data transferred: 493824
class	prio	ideal	xferred	%diff
be	0	101009	224852	122
be	1	89786	106996	19
be	2	78562	70388	-11
be	3	67339	38900	-43
be	4	56116	18420	-68
be	5	44893	19700	-57
be	6	33669	9972	-71
be	7	22446	4596	-80

/cfq-regression-tests/log-result/be0-vs-be1.fio 
total priority: 340
total data transferred: 537064
class	prio	ideal	xferred	%diff
be	0	284328	375540	32
be	1	252736	161524	-37

/cfq-regression-tests/log-result/be0-vs-be7.fio 
total priority: 220
total data transferred: 551912
class	prio	ideal	xferred	%diff
be	0	451564	499956	10
be	7	100347	51956	-49

/cfq-regression-tests/log-result/be4-x-3.fio 
total priority: 300
total data transferred: 509404
class	prio	ideal	xferred	%diff
be	4	169801	196596	15
be	4	169801	198388	16
be	4	169801	114420	-33

/cfq-regression-tests/log-result/be4-x-8.fio 
total priority: 800
total data transferred: 459072
class	prio	ideal	xferred	%diff
be	4	57384	70644	23
be	4	57384	52980	-8
be	4	57384	62356	8
be	4	57384	60660	5
be	4	57384	55028	-5
be	4	57384	69620	21
be	4	57384	51956	-10
be	4	57384	35828	-38


Hardware infos.:
CPU: GenuineIntel Intel(R) Xeon(TM) CPU 3.00GHz 
     (4 logic cpus with hyper-thread on)
memory:2G
HDD:scsi

> ---
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 0e3814b..ca90d42 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -27,6 +27,8 @@ static const int cfq_slice_sync = HZ / 10;
>  static int cfq_slice_async = HZ / 25;
>  static const int cfq_slice_async_rq = 2;
>  static int cfq_slice_idle = HZ / 125;
> +static int cfq_target_latency = HZ * 3/10; /* 300 ms */
> +static int cfq_hist_divisor = 4;
>  
>  /*
>   * offset from end of service tree
> @@ -134,6 +136,9 @@ struct cfq_data {
>  	struct rb_root prio_trees[CFQ_PRIO_LISTS];
>  
>  	unsigned int busy_queues;
> +	unsigned int busy_queues_avg;
> +	unsigned int busy_rt_queues;
> +	unsigned int busy_rt_queues_avg;
>  
>  	int rq_in_driver[2];
>  	int sync_flight;
> @@ -173,6 +178,8 @@ struct cfq_data {
>  	unsigned int cfq_slice[2];
>  	unsigned int cfq_slice_async_rq;
>  	unsigned int cfq_slice_idle;
> +	unsigned int cfq_target_latency;
> +	unsigned int cfq_hist_divisor;
>  
>  	struct list_head cic_list;
>  
> @@ -301,10 +308,40 @@ cfq_prio_to_slice(struct cfq_data *cfqd, struct cfq_queue *cfqq)
>  	return cfq_prio_slice(cfqd, cfq_cfqq_sync(cfqq), cfqq->ioprio);
>  }
>  
> +static inline unsigned
> +cfq_get_interested_queues(struct cfq_data *cfqd, bool rt) {
> +	unsigned min_q, max_q;
> +	unsigned mult = cfqd->cfq_hist_divisor - 1;
> +	unsigned round = cfqd->cfq_hist_divisor / 2;
> +	if (rt) {
> +		min_q = min(cfqd->busy_rt_queues_avg, cfqd->busy_rt_queues);
> +		max_q = max(cfqd->busy_rt_queues_avg, cfqd->busy_rt_queues);
> +		cfqd->busy_rt_queues_avg = (mult * max_q + min_q + round) /
> +			cfqd->cfq_hist_divisor;
> +		return cfqd->busy_rt_queues_avg;
> +	} else {
> +		min_q = min(cfqd->busy_queues_avg, cfqd->busy_queues);
> +		max_q = max(cfqd->busy_queues_avg, cfqd->busy_queues);
> +		cfqd->busy_queues_avg = (mult * max_q + min_q + round) /
> +			cfqd->cfq_hist_divisor;
> +		return cfqd->busy_queues_avg;
> +	}
> +}
> +
>  static inline void
>  cfq_set_prio_slice(struct cfq_data *cfqd, struct cfq_queue *cfqq)
>  {
> -	cfqq->slice_end = cfq_prio_to_slice(cfqd, cfqq) + jiffies;
> +	unsigned process_thr = cfqd->cfq_target_latency / cfqd->cfq_slice[1];
> +	unsigned iq = cfq_get_interested_queues(cfqd, cfq_class_rt(cfqq));
> +	unsigned slice = cfq_prio_to_slice(cfqd, cfqq);
> +
> +	if (iq > process_thr) {
> +		unsigned low_slice = 2 * slice * cfqd->cfq_slice_idle
> +			/ cfqd->cfq_slice[1];

For sync queue, the minimum time slice is decided by slice_idle, base time slice and io priority.
But for async queue, why is the minimum time slice also limited by base time slice of sync queue?


Best Regards
-----
Shan Wei


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/