[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091104190033.GG2870@redhat.com>
Date: Wed, 4 Nov 2009 14:00:33 -0500
From: Vivek Goyal <vgoyal@...hat.com>
To: Divyesh Shah <dpshah@...gle.com>
Cc: Jeff Moyer <jmoyer@...hat.com>, linux-kernel@...r.kernel.org,
jens.axboe@...cle.com, nauman@...gle.com, lizf@...fujitsu.com,
ryov@...inux.co.jp, fernando@....ntt.co.jp, s-uchida@...jp.nec.com,
taka@...inux.co.jp, guijianfeng@...fujitsu.com,
balbir@...ux.vnet.ibm.com, righi.andrea@...il.com,
m-ikeda@...jp.nec.com, akpm@...ux-foundation.org, riel@...hat.com,
kamezawa.hiroyu@...fujitsu.com
Subject: Re: [PATCH 03/20] blkio: Introduce the notion of weights
On Wed, Nov 04, 2009 at 09:07:41AM -0800, Divyesh Shah wrote:
> On Wed, Nov 4, 2009 at 7:41 AM, Vivek Goyal <vgoyal@...hat.com> wrote:
> >
> > On Wed, Nov 04, 2009 at 10:06:16AM -0500, Jeff Moyer wrote:
> > > Vivek Goyal <vgoyal@...hat.com> writes:
> > >
> > > > o Introduce the notion of weights. Priorities are mapped to weights internally.
> > > > These weights will be useful once IO groups are introduced and group's share
> > > > will be decided by the group weight.
> > >
> > > I'm sorry, but I need more background to review this patch. Where do
> > > the min and max come from? Why do you scale 7-0 from 200-900? How does
> > > this map to what was there before (exactly, approximately)?
> > >
> >
> > Well, So far we only have the notion of iopriority for the process and
> > based on that we determine time slice length.
> >
> > Soon we will throw cfq groups also in the mix. Because cpu IO controller
> > is weight driven, people have shown preference that group's share should
> > be decided based on its weight and not introduce the notion of ioprio for
> > groups.
> >
> > So now core scheduling algorithm only recognizes weights for entities (be it
> > cfq queues or cfq groups), and it is required that we convert the ioprio
> > of cfqq into weight.
> >
> > Now it is a matter of coming up with what weight range do we support and
> > how ioprio should be mapped onto these weights. We can always change the
> > mappings but to being with, I have followed following.
> >
> > Allow a weight range from 100 to 1000. Allowing too small a weights like
> > "1", can lead to very interesting corner cases and I wanted to avoid that
> > in first implementation. For example, if some group with weight "1" gets
> > a time slice of 100ms, its vtime will be really high and after that it
> > will not get scheduled in for a very long time.
> >
> > Seconly allowing too small a weights can make vtime of the tree move very
> > fast with faster wrap around of min_vdistime. (especially on SSD where idling
> > might not be enabled, and for every queue expiry we will attribute minimum of
> > 1ms of slice. If weight of the group is "1" it will higher vtime and
> > min_vdisktime will move very fast). We don't want too fast a wrap around
> > of min_vdisktime (especially in case of idle tree. That infrastructure is
> > not part of current patches).
> >
> > Hence, to begin with I wanted to limit the range of weights allowed because
> > wider range opens up lot of interesting corner cases. That's why limited
> > minimum weight to 100. So at max user can expect the 1000/100=10 times service
> > differentiation between highest and lower weight groups. If folks need more
> > than that, we can look into it once things stablize.
>
> We definitely need the 1:100 differentiation. I'm ok with adding that
> later after the core set of patches stabilize but just letting you
> know that it is important to us.
Good to know. I will begin with max service difference of 10 times and
once things stablize, will go enable wider range of weights.
> Also curious why you chose a higher
> range 100-1000 instead of 10-100? For smaller vtime leaps?
Good question. Initially we had thought that range of 1-1000 should be
good enough. Later decided to cap minimum weight to 100. But same can be
achieved by smaller range of 1-100 and capping minimum weight at 10. This
will make vtime leap forward slower also.
Later if somebody needs ratio higher than 1:100, we can think of
supporting even wider weight range.
Thanks Divyesh for the idea. I think I will change weight range to 10-100
and map ioprio 0-7 on weights 20 to 90.
Thanks
Vivek
>
> >
> > Priority and weights follow reverse order. Higher priority means low
> > weight and vice-versa.
> >
> > Currently we support 8 priority levels and prio "4" is the middle point.
> > Anything higher than prio 4 gets 20% less slice as compared to prio 4 and
> > priorities lower than 4, get 20% higher slice of prio 4 (20% higher/lower
> > for each priority level).
> >
> > For weight range 100 - 1000, 500 can be considered as mid point. Now this
> > is how priority mapping looks like.
> >
> > 100 200 300 400 500 600 700 800 900 1000 (Weights)
> > 7 6 5 4 3 2 1 0 (io prio).
> >
> > Once priorities are converted to weights, we are able to retain the notion
> > of 20% difference between prio levels by choosing 500 as the mid point and
> > mapping prio 0-7 to weights 900-200, hence this mapping.
> >
> > I am all ears if you have any suggestions on how this ca be handled
> > better.
> >
> > Thanks
> > Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists