linux-kernel - Re: [patch, rfc: 2/2] sched, hotplug: ensure a task is on the valid cpu after set_cpus_allowed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <b647ffbd0807261249v3aa9a94dy2882f9be98f83465@mail.gmail.com>
Date:	Sat, 26 Jul 2008 21:49:33 +0200
From:	"Dmitry Adamushko" <dmitry.adamushko@...il.com>
To:	"Peter Zijlstra" <a.p.zijlstra@...llo.nl>
Cc:	"Ingo Molnar" <mingo@...e.hu>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [patch, rfc: 2/2] sched, hotplug: ensure a task is on the valid cpu after set_cpus_allowed_ptr()

2008/7/25 Peter Zijlstra <a.p.zijlstra@...llo.nl>:
> On Fri, 2008-07-25 at 15:20 +0200, Dmitry Adamushko wrote:
>> 2008/7/25 Peter Zijlstra <a.p.zijlstra@...llo.nl>:
>> > On Fri, 2008-07-25 at 00:15 +0200, Dmitry Adamushko wrote:
>> >>
>> >> From: Dmitry Adamushko <dmitry.adamushko@...il.com>
>> >> Subject: sched, hotplug: ensure a task is on the valid cpu after
>> >> set_cpus_allowed_ptr()
>> >>
>> >> ---
>> >>     sched, hotplug: ensure a task is on the valid cpu after set_cpus_allowed_ptr()
>> >>
>> >>     The 'new_mask' may not include task_cpu(p) so we migrate 'p' on another 'cpu'.
>> >>     In case it can't be placed on this 'cpu' immediately, we submit a request
>> >>     to the migration thread and wait for its completion.
>> >>
>> >>     Now, by the moment this request gets handled by the migration_thread,
>> >>     'cpu' may well be offline/non-active. As a result, 'p' continues
>> >>     running on its old cpu which is not in the 'new_mask'.
>> >>
>> >>     Fix it: ensure 'p' ends up on a valid cpu.
>> >>
>> >>     Theoreticaly (but unlikely), we may get an endless loop if someone cpu_down()'s
>> >>     a new cpu we have choosen on each iteration.
>> >>
>> >>     Alternatively, we may introduce a special type of request to migration_thread,
>> >>     namely "move_to_any_allowed_cpu" (e.g. by specifying dest_cpu == -1).
>> >>
>> >>     Note, any_active_cpu() instead of any_online_cpu() would be better here.
>> >
>> > Hrmm,.. this is all growing into something of a mess.. defeating the
>> > whole purpose of introducing that cpu_active_map stuff.
>> >
>> > Would the suggested SRCU logic simplify all this?
>>
>> Ah, wait a second.
>>
>> sched_setaffinity() -> set_cpus_allowed_ptr() is ok vs. cpu_down() as
>> it does use get_online_cpus(). So none of the cpus can become offline
>> while we are in set_cpus_allowed_ptr().
>>
>> but there are numerous calls to set_cpus_allowed_ptr() from other
>> places and not all of them seem to call get_online_cpus()...
>>
>> yeah, I should check this issue again..
>>
>> btw., indeed all these different sync. cases are a bit of mess.
>
> Will ponder it a bit more, but my brain can't seem to let go of SRCU
> now..

I like it too.

> I'll go concentrate on making the swap-over-nfs patches prettier,
> maybe that will induce a brainwave ;-)

what's about task-migration over NFS? ;-)


>> btw., I was wondering about this change:
>>
>> ba42059fbd0aa1ac91b582412b5fedb1258f241f
>>
>> sched: hrtick_enabled() should use cpu_active()
>>
>> Peter pointed out that hrtick_enabled() should use cpu_active().
>
> What exactly were you wondering about?
>
> It seemed a good idea to stop starting hrtimers before we migrate them
> to another cpu (one of the things done later in cpu_down), thereby
> avoiding spurious fires on remote cpus.
>

Yeah, I thought that it's likely cpu_down() related.

I looked at it from the point of cpu_up(), e.g. a cpu is online ->
tasks get queued and start running (while cpu is still _not_ active
for a while). So when they get enqueued first time, hrtick_enabled()
wil give 0 and hr-timer won't be used.

Actually, cpu_active_map has already broken expectations/assumptions -
http://lkml.org/lkml/2008/7/24/260 (in case you have missed it). But
this particular "microcode"s behavior is really bad, I think.


-- 
Best regards,
Dmitry Adamushko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/