[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090528160003.GA4335@redhat.com>
Date: Thu, 28 May 2009 12:00:03 -0400
From: Vivek Goyal <vgoyal@...hat.com>
To: Nauman Rafique <nauman@...gle.com>
Cc: linux-kernel@...r.kernel.org,
containers@...ts.linux-foundation.org, dm-devel@...hat.com,
jens.axboe@...cle.com, dpshah@...gle.com, lizf@...fujitsu.com,
mikew@...gle.com, fchecconi@...il.com, paolo.valente@...more.it,
ryov@...inux.co.jp, fernando@....ntt.co.jp, s-uchida@...jp.nec.com,
taka@...inux.co.jp, guijianfeng@...fujitsu.com, jmoyer@...hat.com,
dhaval@...ux.vnet.ibm.com, balbir@...ux.vnet.ibm.com,
righi.andrea@...il.com, m-ikeda@...jp.nec.com, jbaron@...hat.com,
agk@...hat.com, snitzer@...hat.com, akpm@...ux-foundation.org,
peterz@...radead.org
Subject: Re: [PATCH 02/20] io-controller: Common flat fair queuing code in
elevaotor layer
On Wed, May 27, 2009 at 01:53:59PM -0700, Nauman Rafique wrote:
[..]
> > +/**
> > + * __bfq_lookup_next_entity - return the first eligible entity in @st.
> > + * @st: the service tree.
> > + *
> > + * Update the virtual time in @st and return the first eligible entity
> > + * it contains.
> > + */
> > +static struct io_entity *__bfq_lookup_next_entity(struct io_service_tree *st)
> > +{
> > + struct io_entity *entity;
> > +
> > + if (RB_EMPTY_ROOT(&st->active))
> > + return NULL;
> > +
> > + bfq_update_vtime(st);
>
> Vivek, Paolo, Fabio,
> Over here we call bfq_update_vtime(), and this function could have
> been called even when we are just doing a lookup (and not an extract).
> So vtime is updated while we are not really selecting the next queue
> for service (for an example, see elv_preempt_queue()). This can result
> in a call to update_vtime when only an entity with small weight (say
> weight=1) is backlogged and another entity with bigger weight (say 10)
> is getting serviced so it is not in the tree (we extract the entity
> which is getting service). This results in a big vtime jump to the
> start time of the entity with weight 1 (entity of weight 1 would have
> big start times, as it has small weight). Now when another entity with
> bigger weight (say 90) gets backlogged, it is assigned a new vtime
> from service tree's vtime, causing it to get a big value. In the
> meanwhile, iog for weight 10 keeps getting service for many quantums,
> as it was continuously backlogged.
>
> The problem happens because we extract an entity (removing it from the
> tree) while it is getting service, and do vtime jumps based on what is
> still in the tree. I think we need to add an extra check on the vtime
> of the entity in service, before we take a vtime jump.
>
> I have actually seen this happening when trying to debug on of my
> tests. Please let me know what you think.
>
Hi Nauman,
This sounds like a problem but apart from elv_preempt_queue(), in which path
do you see it?
Initially fabio posted the patches where preemption was allowed only if
the preempting queue is the next one to be served. To keep the behavior
same as CFQ, I have changed that logic and if we decide to preempt the
current queue (based on many checks like sync or async), I expire the
current queue and add the new queue to the front of the service tree so
that it is selected next. It does impact the fairness a bit but that's how
cfq does it today and concept of preemption with-in same prio class is
somewhat peculiar.
So in latest patches preemption path is covered. There does not seem to be
other paths where we do lookups while and active entity is being served.
For example, update_next_active() does not continue with lookup if there
is an active entity. So with this version of patch, you should not face
the issue.
In general, I think it is a good idea to fix it in update_vtime() so
that not to udpate vtime if there is an active entity and not rely on
caller who makes sure that update_vtime() is not called when there is
an active entity. Do you have a quick patch for that?
Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists