lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 16 Jul 2008 13:29:09 -0700
From:	Max Krasnyansky <maxk@...lcomm.com>
To:	Dmitry Adamushko <dmitry.adamushko@...il.com>
CC:	mingo@...e.hu, a.p.zijlstra@...llo.nl,
	torvalds@...ux-foundation.org, pj@....com, ghaskins@...ell.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redo
 sched domain managment (take 2)



Dmitry Adamushko wrote:
> 2008/7/15 Max Krasnyansky <maxk@...lcomm.com>:
>> From: Max Krasnyanskiy <maxk@...lcomm.com>
>>
>> Addressed Ingo's comments and merged on top of latest Linus's tree.
> 
> a few remarks:
> 
> (1) in __migrate_task(), a test for !cpu_active(dest_cpu) should be
> done after double_rq_lock() [ or add the second one ]
> 
> migration_thread() calls __migrate_task() with disabled interrupts (no
> rq-locks held), i.e. if we merely rely on rq-locks for
> synchronization, this can race with cpu_down(dest_cpu).
> 
> [ assume, the test was done in __migration_task() and it's about to
> take double_lock()... and at this time, down_cpu(dest_cpu) starts and
> completes on another CPU ]
> 
> note, we may still take the rq-lock for a "dead" cpu in this case and
> then only do a check (remark: in fact, not with stop_machine() in
> place _but_ I consider that we don't make any assumptions on its
> behavior);
Hmm, as you suggested I added synchronize_sched() after clearing the active
bit (see below). Is that not nought enough ? I mean you mentioned that
stop_machine syncs things up, I assume synchronize_sched does too.

I guess testing inside the double_rq_lock() does not hurt anyway. We already
have fail recovery path there. But are you sure it's needed given the explicit
sync (in fact we have double sync now :), one with synchronize_sched() and
then with the stop_machine)).

> (2) it's worth to take a look at the use of any_online_cpu():
> 
> many places are ok, because there will be an additional check against
> cpu_active_mask later on, but e.g.
> 
> set_cpus_allowed_ptr() ->
> migrate_task(p, any_online_cpu(mask), ...) ->
> migrate_task(p, dest_cpu)
> 
> doesn't seem to have any verifications wrt cpu_active_map.
How about we just introduce any_active_cpu() and replace all the usages of
any_online_cpu() in the scheduler ?

> (3) I assume, we have some kind of explicit sched_sync() after
> cpu_clear(cpu, cpu_active_mask) because:
> 
> (a) not all places where task-migration may take place do take the
> rq-lock for dest_cpu : e.g. try_to_wake_up() or even
> sched_migrate_task() [ yes, there is a special (one might call
> "subtle") assumption/restriction in this case ]
> 
> that's why the fact that cpu_down() takes the rq-lock for
> soon-to-be-offline cpu at some point can not be a "natural" sync.
> point to guarantee that "stale" cpu_active_map is not used.
> 
> (b) in fact, stop_machine() acts as a (very strong) sync. point,
> sched-wise. But perhaps, we don't want to have this new easy-to-follow
> approach to be built on top of assumptions on how something from
> another sub-system behaves.
Yep. As you suggested I've added synchronize_sched() and updated the comment
that explains the deal with the stop machine.
	http://lkml.org/lkml/2008/7/15/736
Peter, already ACKed it.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ