[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20110221153243.GE6428@redhat.com>
Date: Mon, 21 Feb 2011 10:32:43 -0500
From: Vivek Goyal <vgoyal@...hat.com>
To: Chad Talbott <ctalbott@...gle.com>
Cc: jaxboe@...ionio.com, guijianfeng@...fujitsu.com, mrubin@...gle.com,
teravest@...gle.com, jmoyer@...hat.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] Avoid preferential treatment of groups that aren't
backlogged
On Fri, Feb 18, 2011 at 02:19:16PM -0800, Chad Talbott wrote:
> Problem: If a group isn't backlogged, we remove it from the service
> tree. When it becomes active again, it gets either the minimum vtime
> of the tree or gets put at the "back of the line." That is true even
> when the group was idle for a very small time, and it consumed some IO
> time right before it became idle. If group has very small weight, it
> can end up using more disk time than its fair share. Conversely, if it
> has a very large weight, being put at the back of the vtime line is a
> large penalty which prevents it consuming its share.
>
> Solution: We solve the problem by assigning the group its old vtime if
> it has not been idle long enough. Otherise we assign it the service
> tree's min vtime.
>
> Complications: When an entire service tree becomes completely idle, we
> lose the vtime state. All the old vtime values are not relevant any
> more. For example, consider the case when the service tree is idle and
> a brand new group sends IO. That group would have an old vtime value
> of zero, but the service tree's vtime would become closer to zero. In
> such a case, it would be unfair for the older groups to get a much
> higher old vtime stored in them.
>
> We solve that issue by keeping a generation number that counts the
> number of instances when the service tree becomes completely
> empty. The generation number is stored in each group too. If a group
> becomes backlogged after a service tree has been empty, we compare its
> stored generation number with the current service tree generation
> number, and discard the old vtime if the group generation number is
> stale.
>
> The preemption specific code is taken care of automatically because we
> allow preemption checks after inserting a group back into the service
> tree and assigning it an appropriate vtime.
>
> Signed-off-by: Chad Talbott <ctalbott@...gle.com>
Looks good to me.
Acked-by: Vivek Goyal <vgoyal@...hat.com>
Vivek
> ---
> block/cfq-iosched.c | 27 ++++++++++++++-------------
> 1 files changed, 14 insertions(+), 13 deletions(-)
>
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 501ffdf..8a1b82f 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -178,6 +178,7 @@ struct cfq_group {
>
> /* group service_tree key */
> u64 vdisktime;
> + u64 generation_num;
> unsigned int weight;
>
> /* number of cfqq currently on this group */
> @@ -300,6 +301,9 @@ struct cfq_data {
> /* List of cfq groups being managed on this device*/
> struct hlist_head cfqg_list;
> struct rcu_head rcu;
> +
> + /* Generation number, counts service tree empty events */
> + u64 active_generation;
> };
>
> static struct cfq_group *cfq_get_next_cfqg(struct cfq_data *cfqd);
> @@ -866,25 +870,19 @@ static void
> cfq_group_service_tree_add(struct cfq_data *cfqd, struct cfq_group *cfqg)
> {
> struct cfq_rb_root *st = &cfqd->grp_service_tree;
> - struct cfq_group *__cfqg;
> - struct rb_node *n;
>
> cfqg->nr_cfqq++;
> if (!RB_EMPTY_NODE(&cfqg->rb_node))
> return;
>
> - /*
> - * Currently put the group at the end. Later implement something
> - * so that groups get lesser vtime based on their weights, so that
> - * if group does not loose all if it was not continously backlogged.
> - */
> - n = rb_last(&st->rb);
> - if (n) {
> - __cfqg = rb_entry_cfqg(n);
> - cfqg->vdisktime = __cfqg->vdisktime + CFQ_IDLE_DELAY;
> - } else
> + if (cfqd->active_generation > cfqg->generation_num)
> cfqg->vdisktime = st->min_vdisktime;
> -
> + else
> + /* We assume that vdisktime was not modified when the task
> + was off the service tree.
> + */
> + cfqg->vdisktime = max_vdisktime(st->min_vdisktime,
> + cfqg->vdisktime);
> __cfq_group_service_tree_add(st, cfqg);
> st->total_weight += cfqg->weight;
> }
> @@ -906,6 +904,9 @@ cfq_group_service_tree_del(struct cfq_data *cfqd, struct cfq_group *cfqg)
> if (!RB_EMPTY_NODE(&cfqg->rb_node))
> cfq_rb_erase(&cfqg->rb_node, st);
> cfqg->saved_workload_slice = 0;
> + cfqg->generation_num = cfqd->active_generation;
> + if (RB_EMPTY_ROOT(&st->rb))
> + cfqd->active_generation++;
> cfq_blkiocg_update_dequeue_stats(&cfqg->blkg, 1);
> }
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists