[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4D0A6D34.6070806@redhat.com>
Date: Thu, 16 Dec 2010 14:49:08 -0500
From: Rik van Riel <riel@...hat.com>
To: Mike Galbraith <efault@....de>
CC: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Avi Kiviti <avi@...hat.com>,
Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Chris Wright <chrisw@...s-sol.org>
Subject: Re: [RFC -v2 PATCH 2/3] sched: add yield_to function
On 12/14/2010 01:08 AM, Mike Galbraith wrote:
> On Mon, 2010-12-13 at 22:46 -0500, Rik van Riel wrote:
>
>> diff --git a/kernel/sched.c b/kernel/sched.c
>> index dc91a4d..6399641 100644
>> --- a/kernel/sched.c
>> +++ b/kernel/sched.c
>> @@ -5166,6 +5166,46 @@ SYSCALL_DEFINE3(sched_getaffinity, pid_t, pid, unsigned int, len,
>> return ret;
>> }
>>
>> +/*
>> + * Yield the CPU, giving the remainder of our time slice to task p.
>> + * Typically used to hand CPU time to another thread inside the same
>> + * process, eg. when p holds a resource other threads are waiting for.
>> + * Giving priority to p may help get that resource released sooner.
>> + */
>> +void yield_to(struct task_struct *p)
>> +{
>> + unsigned long flags;
>> + struct rq *rq, *p_rq;
>> +
>> + local_irq_save(flags);
>> + rq = this_rq();
>> +again:
>> + p_rq = task_rq(p);
>> + double_rq_lock(rq, p_rq);
>> + if (p_rq != task_rq(p)) {
>> + double_rq_unlock(rq, p_rq);
>> + goto again;
>> + }
>> +
>> + /* We can't yield to a process that doesn't want to run. */
>> + if (!p->se.on_rq)
>> + goto out;
>> +
>> + /*
>> + * We can only yield to a runnable task, in the same schedule class
>> + * as the current task, if the schedule class implements yield_to_task.
>> + */
>> + if (!task_running(rq, p)&& current->sched_class == p->sched_class&&
>> + current->sched_class->yield_to)
>> + current->sched_class->yield_to(rq, p);
>> +
>> +out:
>> + double_rq_unlock(rq, p_rq);
>> + local_irq_restore(flags);
>> + yield();
>> +}
>> +EXPORT_SYMBOL_GPL(yield_to);
>
> That part looks ok, except for the yield cross cpu bit. Trying to yield
> a resource you don't have doesn't make much sense to me.
The current task just donated the rest of its timeslice.
Surely that makes it a reasonable idea to call yield, and
get one of the other tasks on the current CPU running for
a bit?
> <ramble>
> slice_remain() measures the distance to your last preemption, which has
> no relationship with entitlement. sched_slice() is not used to issue
> entitlement, it's only a ruler.
>
> You have entitlement on your current runqueue only, that entitlement
> being the instantaneous distance to min_vruntime in a closed and fluid
> system. You can't inject some instantaneous relationship from one
> closed system into an another without making the math go kind of fuzzy,
> so you need tight constraints on how fuzzy it can get.
>
> We do that with migrations, inject fuzz. There is no global fair-stick,
> but we invent one by injecting little bits of fuzz. It's constrained by
> chaos and the magnitude constraints of the common engine. The more you
> migrate, the more tightly you couple systems. As long as we stay fairly
> well balanced, we can migrate without the fuzz getting out of hand, and
> end up with a globally ~fair system.
>
> What you're injecting isn't instantaneously irrelevant lag-fuzz, which
> distributed over time becomes relevant. you're inventing entitlement out
> of the void. Likely not a big hairy deal unless you do it frequently,
> but you're doing something completely bogus and seemingly unconstrained.
> </ramble>
I'm open to suggestions on what to do instead.
>> +static void yield_to_fair(struct rq *rq, struct task_struct *p)
>> +{
>> + struct sched_entity *se =&p->se;
>> + struct cfs_rq *cfs_rq = cfs_rq_of(se);
>> + u64 remain = slice_remain(current);
>> +
>> + dequeue_task(rq, p, 0);
>> + se->vruntime -= remain;
>> + if (se->vruntime< cfs_rq->min_vruntime)
>> + se->vruntime = cfs_rq->min_vruntime;
>
> This has an excellent chance of moving the recipient rightward.. and the
> yielding task didn't yield anything. This may achieve the desired
> result or may just create a nasty latency spike... but it makes no
> arithmetic sense.
Good point, the current task calls yield() in the function
that calls yield_to_fair, but I seem to have lost the code
that penalizes the current task's runtime...
I'll reinstate that.
--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists