[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <193b0f820911091312y125209deufadd1040aff65cfd@mail.gmail.com>
Date: Mon, 9 Nov 2009 19:12:35 -0200
From: Lucas De Marchi <lucas.de.marchi@...il.com>
To: Gregory Haskins <ghaskins@...ell.com>
Cc: Ingo Molnar <mingo@...e.hu>, Steven Rostedt <rostedt@...dmis.org>,
Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org
Subject: Re: [lucas.de.marchi@...il.com: Bug when changing cpus_allowed of RT
tasks?]
On Mon, Nov 9, 2009 at 17:35, Gregory Haskins <ghaskins@...ell.com> wrote:
> Hi Lucas,
>
> Ingo asked me to take a look at the problem you are reporting. Is this a bug you are seeing in the
> wild, or was this found by code-inspection? I took a look, and it looks to me that the original code
> is correct, but I will keep an open mind.
I found this "maybe"-bug in some changes I'm doing to balance of RT tasks. It
was not giving me the expected results so inspecting the code I supposed this
was the problem. Applying the patch it solved my problem, but I think it was a
problem with my code. Comments inlined.
>> static int select_task_rq_rt(struct task_struct *p, int sd_flag, int flags)
>> {
>> [...]
>> if (unlikely(rt_task(rq->curr)) &&
>> (p->rt.nr_cpus_allowed > 1)) {
>> int cpu = find_lowest_rq(p);
>>
>> return (cpu == -1) ? task_cpu(p) : cpu;
>> }
/*
* Otherwise, just let it ride on the affined RQ and the
* post-schedule router will push the preempted task away
*/
return task_cpu(p);
>> }
I completed the rest of function to emphasize it will return task_cpu(p)
afterwards.
>
> So the intent of this logic is to say "if the task is of type RT, and it can move, see if it can move
> elsewhere". Otherwise, we do not try to move it at all.
I'd say "if _current_ is of type RT, and _p_ can move, see if _p_ can move
elsewhere". And this check is repeated for p inside find_lowest_rq, so it would
not be needed here. Just let it call find_lowest_rq and -1 will be returned.
>> static int find_lowest_rq(struct task_struct *task)
>> {
[...]
>> if (task->rt.nr_cpus_allowed == 1)
>> return -1; /* No other targets possible */
>> [...]
>> }
>> What happens if rt.nr_cpus_allowed continues to be 1, but the allowed cpu is
>> not the one in which task is being woken up?
>
> The one case that I can think of that would fall into this category is during a sched_setaffinity()?
>
> If so, what happens is that we ultimately call set_cpus_allowed() -> set_cpus_allowed_ptr(). Inside
> this function, the new mask is validated independent of the nr_cpus_allowed value and we will
> migrate the task away if it is no longer on a valid core.
This was exactly the case I was falling into. But I was setting the affinity
inside the kernel, not passing through setaffinity syscall, and I was wrongly
calling set_cpus_allowed_rt directly instead of set_cpus_allowed, so task would
not migrate in that case.
>
> Until further evidence is presented, I have to respectfully NAK the patch, as I do not think its doing the right thing
> nor do I think the current code is actually broken.
I see now it's not doing the right thing. IMO only the double check of
rt.nr_cpus_allowed is superfluous, but not wrong.
Thanks for clarifications
Lucas De Marchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists