linux-kernel - Re: current linux-2.6.git: cpusets completely broken

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 15 Jul 2008 02:00:32 +0200
From:	"Dmitry Adamushko" <dmitry.adamushko@...il.com>
To:	"Linus Torvalds" <torvalds@...ux-foundation.org>
Cc:	"Vegard Nossum" <vegard.nossum@...il.com>,
	"Paul Menage" <menage@...gle.com>,
	"Max Krasnyansky" <maxk@...lcomm.com>, "Paul Jackson" <pj@....com>,
	"Peter Zijlstra" <a.p.zijlstra@...llo.nl>, miaox@...fujitsu.com,
	rostedt@...dmis.org, "Thomas Gleixner" <tglx@...utronix.de>,
	"Ingo Molnar" <mingo@...e.hu>,
	"Linux Kernel" <linux-kernel@...r.kernel.org>
Subject: Re: current linux-2.6.git: cpusets completely broken

2008/7/15 Linus Torvalds <torvalds@...ux-foundation.org>:
>
> On Tue, 15 Jul 2008, Dmitry Adamushko wrote:
>>
>> cpu_clear(cpu, cpu_active_map); _alone_ does not guarantee that after
>> its completion, no new tasks can appear on (be migrated to) 'cpu'.
>
> But I think we should make it do that.
>
> I do realize that we "queue" processes, but that's part of the whole
> complexity. More importantly, the people who do that kind of asynchronous
> queueing don't even really care - *if* they cared about the process
> _having_ to show up on the destination core, they'd be waiting
> synchronously and re-trying (which they do).
>
> So by doing the test for cpu_active_map not at queuing time, but at the
> time when we actually try to do the migration,
> we can now also make that
> cpu_active_map be totally serialized.
>
> (Of course, anybody who clears the bit does need to take the runqueue lock
> of that CPU too, but cpu_down() will have to do that as it does the
> "migrate away live tasks" anyway, so that's not a problem)

The 'synchronization' point occurs even earlier - when cpu_down() ->
__stop_machine_run() gets called (as I described in my previous mail).

My point was that if it's ok to have a _delayed_ synchronization
point, having it not immediately after cpu_clear(cpu, cpu_active_map)
but when the "runqueue lock" is taken a bit later (as you pointed out
above) or __stop_machine_run() gets executed (which is a sync point,
scheduling-wise),

then we can implement the proper synchronization (hotplugging vs.
task-migration) with cpu_online_map (no need for cpu_active_map).

Note, currently, _not_ all places in the scheduler where an actual
migration (not just queuing requests) takes place do the test for
cpu_offline(). Instead, they (blindly) rely on the assumption that if
a cpu is available via sched-domains, then it's guaranteed to be
online (and can be migrated to).

Provided all those places had cpu_offline() (additionally) in place,
the bug which has been discussed in this thread would _not_ happen
and, moreover, we would _not_ need to do all the fancy "attach NULL
domains" sched-domain manipulations (which depend on DOWN_PREPARE,
DOWN and other hotpluging events). We would only need to rebuild
domains once upon CPU_DOWN (on success).

p.s. hope my point is more understandable now (or it's clear that I'm
missing something at this late hour :^)

>
>                Linus
>

-- 
Best regards,
Dmitry Adamushko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/