linux-kernel - Re: [PATCH] cpu hotplug, sched:Introduce cpu_active_map and redoscheddomainmanagment (take 2)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 22 Jul 2008 12:32:03 -0700
From:	Max Krasnyansky <maxk@...lcomm.com>
To:	Gregory Haskins <ghaskins@...ell.com>
CC:	Peter Zijlstra <a.p.zijlstra@...llo.nl>, mingo@...e.hu,
	dmitry.adamushko@...il.com, torvalds@...ux-foundation.org,
	pj@....com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] cpu hotplug,	sched:Introduce	cpu_active_map	and	redoscheddomainmanagment
 (take 2)

Gregory Haskins wrote:
> Max Krasnyansky wrote:
>> Greg, correct me if I'm wrong but we seem to have exact same issue 
>> with the rq->rq->online map. Lets take "cpu going down" for
>> example. We're clearing rq->rd->online bit on DYING event, but
>> nothing AFAICS prevents another cpu calling
>> rebuild_sched_domains()->partition_sched_domains() in the middle of
>> the hotplug sequence. partition_sched_domains() will happily reset
>> rd->rq->online mask and things will fail. I'm talking about this
>> path
>>
>> __build_sched_domains() -> cpu_attach_domain() -> rq_attach_root()
>>     ...
>>     cpu_set(rq->cpu, rd->span);
>>     if (cpu_isset(rq->cpu, cpu_online_map))
>>         set_rq_online(rq);
>>     ...
>>
>>   
> 
> I think you are right, but wouldn't s/online/active above fix that as 
> well?  The active_map didnt exist at the time that code went in 
> initially ;)

Actually after a bit more thinking :) I realized that the scenario I 
explained above cannot happen because partition_sched_domains() must be 
called under get_online_cpus() and the set_rq_online() happens in the 
hotplug writer's path (ie under cpu_hotplug.lock). Since I unified all 
the other domain rebuild paths (arch_reinit_sched_domains, etc) we 
should be safe. But it again means we'd rely on those intricate 
dependencies that we wanted to avoid with the cpu_active_map. Also 
cpusets might still need to rebuild the domains in the hotplug writer's 
path.
So it's better to fix it once and for all :)

>> -- 
>>
>> btw Why didn't we convert sched*.c to use rq->rd->online when it was
>> introduced ? ie Instead of using cpu_online_map directly.
>>   
> I think things were converted where they made sense to convert.  But we 
> also had a different goal at that time, so perhaps something was 
> missed.  If you think something else should be converted, please point 
> it out.
Ok. I'll keep an eye on it.

> In the meantime, I would suggest we consider this patch on top of yours 
> (applies to tip/sched/devel):
> 
> ----------------------
> 
> sched: Fully integrate cpus_active_map and root-domain code
>   Signed-off-by: Gregory Haskins <ghaskins@...ell.com>

Ack.
The only thing I'm a bit unsure of is the error scenarios in the cpu 
hotplug event sequence. online_map is not cleared when something in the 
notifier chain fails, but active_map is.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/