linux-kernel - Re: [PATCH] cfq-iosched: queue groups more gracefully

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4E047145.8050601@parallels.com>
Date:	Fri, 24 Jun 2011 15:13:09 +0400
From:	Konstantin Khlebnikov <khlebnikov@...allels.com>
To:	Vivek Goyal <vgoyal@...hat.com>
CC:	Jens Axboe <axboe@...nel.dk>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] cfq-iosched: queue groups more gracefully

Vivek Goyal wrote:
> On Thu, Jun 23, 2011 at 08:22:06PM +0400, Konstantin Khlebnikov wrote:
>> This patch queue awakened cfq-groups according its current vdisktime,
>> it try to save upto one group timeslice from unused virtual disk time.
>> Thus group does not loses everything, if it was not continuously backlogged.
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@...nvz.org>
>
> I think this patch is not required till we start preemption across
> groups? Any more details of actual use will help.

I saw some problems with fairness and latency between groups with parallel
intensive IO and interactive groups -- cfq always put interactive groups at the end,
so its latency is extremely high. With this patch interactive groups got real chance to
be scheduled much earlier. I'm sorry, I can not show simple test-cases right now.

>
>> ---
>>   block/cfq-iosched.c |   36 ++++++++++++++++++++++++++++++------
>>   1 files changed, 30 insertions(+), 6 deletions(-)
>>
>> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
>> index c71533e..d5c7c79 100644
>> --- a/block/cfq-iosched.c
>> +++ b/block/cfq-iosched.c
>> @@ -592,6 +592,26 @@ cfq_group_slice(struct cfq_data *cfqd, struct cfq_group *cfqg)
>>   	return cfq_target_latency * cfqg->weight / st->total_weight;
>>   }
>>
>> +static inline u64
>> +cfq_group_vslice(struct cfq_data *cfqd, struct cfq_group *cfqg)
>> +{
>> +	struct cfq_rb_root *st =&cfqd->grp_service_tree;
>> +	u64 vslice;
>> +
>> +	/* There no group slices in iops mode */
>> +	if (iops_mode(cfqd))
>> +		return 0;
>> +
>> +	/*
>> +	 * Equal to cfq_scale_slice(cfq_group_slice(cfqd, cfqg), cfqg).
>> +	 * Add group weight beacuse it currently not in service tree.
>> +	 */
>> +	vslice = (u64)cfq_target_latency<<  CFQ_SERVICE_SHIFT;
>> +	vslice *= BLKIO_WEIGHT_DEFAULT;
>> +	do_div(vslice, st->total_weight + cfqg->weight);
>
> Above is not equivalent to cfq_scale_slice(cfq_group_slice(cfqd, cfqg),
> cfqg) as comment says.
>
> you are not calculating cfq_group_slice(). Instead using cfq_target_latency.

No, this this expression gives the same value as cfq_scale_slice(cfq_group_slice())
after the group will be added to service tree. It is equal to slice that the group will receive
if it will be queued immediately after the addition.

>
> Also it does not make sense. A higher weight group gets lower vslice
> and in turn gets put further away on the tree. This is reverse of what
> you want.
>
>> +	return vslice;
>> +}
>> +
>>   static inline unsigned
>>   cfq_scaled_cfqq_slice(struct cfq_data *cfqd, struct cfq_queue *cfqq)
>>   {
>> @@ -884,16 +904,20 @@ cfq_group_notify_queue_add(struct cfq_data *cfqd, struct cfq_group *cfqg)
>>   		return;
>>
>>   	/*
>> -	 * Currently put the group at the end. Later implement something
>> -	 * so that groups get lesser vtime based on their weights, so that
>> -	 * if group does not loose all if it was not continuously backlogged.
>> +	 * Bump vdisktime to be greater or equal min_vdisktime.
>> +	 */
>> +	cfqg->vdisktime = max_vdisktime(cfqg->vdisktime, st->min_vdisktime);
>> +
>
> why do we need to do this?

Time should not go back, it's dangerous.

>
>> +	/*
>> +	 * Put the group at the end, but save one slice from unused time.
>>   	 */
>>   	n = rb_last(&st->rb);
>>   	if (n) {
>>   		__cfqg = rb_entry_cfqg(n);
>> -		cfqg->vdisktime = __cfqg->vdisktime + CFQ_IDLE_DELAY;
>> -	} else
>> -		cfqg->vdisktime = st->min_vdisktime;
>> +		cfqg->vdisktime = max_vdisktime(cfqg->vdisktime,
> 						^^^^^^^
> I think you meant st->min_vdisktime here?

No, I adjust group vdisktime to put it at the end, but save up to one slice.
Although there may be a problem with the overlap, with wakeup after looong sleep..

>> +				__cfqg->vdisktime -
>> +					cfq_group_vslice(cfqd, cfqg));
>> +	}
>>   	cfq_group_service_tree_add(st, cfqg);
>>   }
>>
>
> Thanks
> Vivek

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/