lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 11 Jul 2008 21:43:18 +0200
From:	"Vegard Nossum" <vegard.nossum@...il.com>
To:	"Paul Menage" <menage@...gle.com>,
	"Max Krasnyansky" <maxk@...lcomm.com>
Cc:	"Dmitry Adamushko" <dmitry.adamushko@...il.com>,
	"Paul Jackson" <pj@....com>,
	"Peter Zijlstra" <a.p.zijlstra@...llo.nl>, miaox@...fujitsu.com,
	rostedt@...dmis.org, "Thomas Gleixner" <tglx@...utronix.de>,
	"Linux Kernel" <linux-kernel@...r.kernel.org>
Subject: Re: current linux-2.6.git: cpusets completely broken

On Fri, Jul 11, 2008 at 9:36 PM, Paul Menage <menage@...gle.com> wrote:
> On Fri, Jul 11, 2008 at 12:07 PM, Vegard Nossum <vegard.nossum@...il.com> wrote:
>>
>> The result of having CPUSETS enabled as above is a 100% reproducible
>> BUG on the very first cpu hot-unplug:
>>
>> ------------[ cut here ]------------
>> kernel BUG at xxx/linux-2.6/kernel/sched.c:5859!
>
> That doesn't quite match up with any BUG in 2.6.26-rc9 - what tree is
> this last crash based on?

latest mainline. Commit e5a5816f7875207cb0a0a7032e39a4686c5e10a4.

Is this one:

/* called under rq->lock with disabled interrupts */
static void migrate_dead(unsigned int dead_cpu, struct task_struct *p)
{
        struct rq *rq = cpu_rq(dead_cpu);

        /* Must be exiting, otherwise would be on tasklist. */
        BUG_ON(!p->exit_state);

>> Also, this is on the latest linux-2.6.git! Since we're so close to
>> release, maybe cpusets should simply be marked BROKEN for now? (Unless
>> we can fix it, of course. The alternative is to apply Miao Xie's
>> workaround patch temporarily.)
>
> If we were going to mark anything as broken, wouldn't cpu-hotplug be
> the more appropriate victim? I suspect that there are more systems
> using cpusets in production environments than using cpu hotplug. But
> as you say, fixing it sounds better.

I'm sorry for the harsh characterization and suggestion; please accept
my apology. It was purely a result of my excitement at having made
some progress in this case.

But I have more good news; reverting this:

commit f18f982abf183e91f435990d337164c7a43d1e6d
Author: Max Krasnyansky <maxk@...lcomm.com>
Date:   Thu May 29 11:17:01 2008 -0700

    sched: CPU hotplug events must not destroy scheduler domains created by the
cpusets

    First issue is not related to the cpusets. We're simply leaking doms_cur.
    It's allocated in arch_init_sched_domains() which is called for every
    hotplug event. So we just keep reallocation doms_cur without freeing it.
    I introduced free_sched_domains() function that cleans things up.

    Second issue is that sched domains created by the cpusets are
    completely destroyed by the CPU hotplug events. For all CPU hotplug
    events scheduler attaches all CPUs to the NULL domain and then puts
    them all into the single domain thereby destroying domains created
    by the cpusets (partition_sched_domains).
    The solution is simple, when cpusets are enabled scheduler should not
    create default domain and instead let cpusets do that. Which is
    exactly what the patch does.

    Signed-off-by: Max Krasnyansky <maxk@...lcomm.com>
    Cc: pj@....com
    Cc: menage@...gle.com
    Cc: rostedt@...dmis.org
    Acked-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
    Signed-off-by: Thomas Gleixner <tglx@...utronix.de>

gets rid of the BUG! (Added people to Ccs.)

Might I instead suggest a revert of this? (Again, unless somebody else
can spot the real error and fix it before 2.6.26 is out :-))


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ