[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4877BD66.30802@qualcomm.com>
Date: Fri, 11 Jul 2008 13:07:02 -0700
From: Max Krasnyansky <maxk@...lcomm.com>
To: Vegard Nossum <vegard.nossum@...il.com>
CC: Paul Menage <menage@...gle.com>,
Dmitry Adamushko <dmitry.adamushko@...il.com>,
Paul Jackson <pj@....com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>, miaox@...fujitsu.com,
rostedt@...dmis.org, Thomas Gleixner <tglx@...utronix.de>,
Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: current linux-2.6.git: cpusets completely broken
Vegard Nossum wrote:
> On Fri, Jul 11, 2008 at 9:36 PM, Paul Menage <menage@...gle.com> wrote:
>> On Fri, Jul 11, 2008 at 12:07 PM, Vegard Nossum <vegard.nossum@...il.com> wrote:
>>> The result of having CPUSETS enabled as above is a 100% reproducible
>>> BUG on the very first cpu hot-unplug:
>>>
>>> ------------[ cut here ]------------
>>> kernel BUG at xxx/linux-2.6/kernel/sched.c:5859!
>> That doesn't quite match up with any BUG in 2.6.26-rc9 - what tree is
>> this last crash based on?
>
> latest mainline. Commit e5a5816f7875207cb0a0a7032e39a4686c5e10a4.
>
> Is this one:
>
> /* called under rq->lock with disabled interrupts */
> static void migrate_dead(unsigned int dead_cpu, struct task_struct *p)
> {
> struct rq *rq = cpu_rq(dead_cpu);
>
> /* Must be exiting, otherwise would be on tasklist. */
> BUG_ON(!p->exit_state);
>
>>> Also, this is on the latest linux-2.6.git! Since we're so close to
>>> release, maybe cpusets should simply be marked BROKEN for now? (Unless
>>> we can fix it, of course. The alternative is to apply Miao Xie's
>>> workaround patch temporarily.)
>> If we were going to mark anything as broken, wouldn't cpu-hotplug be
>> the more appropriate victim? I suspect that there are more systems
>> using cpusets in production environments than using cpu hotplug. But
>> as you say, fixing it sounds better.
>
> I'm sorry for the harsh characterization and suggestion; please accept
> my apology. It was purely a result of my excitement at having made
> some progress in this case.
>
> But I have more good news; reverting this:
>
> commit f18f982abf183e91f435990d337164c7a43d1e6d
> Author: Max Krasnyansky <maxk@...lcomm.com>
> Date: Thu May 29 11:17:01 2008 -0700
>
> sched: CPU hotplug events must not destroy scheduler domains created by the
> cpusets
>
> First issue is not related to the cpusets. We're simply leaking doms_cur.
> It's allocated in arch_init_sched_domains() which is called for every
> hotplug event. So we just keep reallocation doms_cur without freeing it.
> I introduced free_sched_domains() function that cleans things up.
>
> Second issue is that sched domains created by the cpusets are
> completely destroyed by the CPU hotplug events. For all CPU hotplug
> events scheduler attaches all CPUs to the NULL domain and then puts
> them all into the single domain thereby destroying domains created
> by the cpusets (partition_sched_domains).
> The solution is simple, when cpusets are enabled scheduler should not
> create default domain and instead let cpusets do that. Which is
> exactly what the patch does.
>
> Signed-off-by: Max Krasnyansky <maxk@...lcomm.com>
> Cc: pj@....com
> Cc: menage@...gle.com
> Cc: rostedt@...dmis.org
> Acked-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
>
> gets rid of the BUG! (Added people to Ccs.)
Really ? Just by looking at the backtraces in your first email it seems
unrelated.
> Might I instead suggest a revert of this? (Again, unless somebody else
> can spot the real error and fix it before 2.6.26 is out :-))
I'd actually be ok with reverting it. Paul and I were looking into some
circular locking issues triggered by the very same patch. Since we do
not have a solution yet we could revert it for now and work on a fix
during .27-rc series.
Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists