linux-kernel - Re: [PATCH 4/7] sched: implement force_cpus

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 07 Dec 2009 09:35:00 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Tejun Heo <tj@...nel.org>
Cc:	tglx@...utronix.de, mingo@...e.hu, avi@...hat.com, efault@....de,
	rusty@...tcorp.com.au, linux-kernel@...r.kernel.org,
	Gautham R Shenoy <ego@...ibm.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 4/7] sched: implement force_cpus_allowed()

On Mon, 2009-12-07 at 13:34 +0900, Tejun Heo wrote:
> Hello, Peter.
> 
> On 12/04/2009 07:43 PM, Peter Zijlstra wrote:
> >>> force_cpus_allowed() will be used for concurrency-managed workqueue.
> >>
> >> Would still like to know why all this is needed.
> > 
> > That is, what problem do these new-fangled workqueues have and why is
> > this a good solution.
> 
> This is the original RFC posting of cmwq which includes the whole
> thing.  I'm a few days away from posting a new version but the usage
> of force_cpus_allowed() remains the same.
> 
>   http://thread.gmane.org/gmane.linux.kernel/896268/focus=896294
> 
> There are two tests which are bypassed by the force_ variant.
> 
> * PF_THREAD_BOUND.  This is used to mark tasks which are bound to a
>   cpu using kthread_bind() to be bound permanently.  However, new
>   trustee based workqueue hotplugging decouples per-cpu workqueue
>   flushing with cpu hot plug/unplugging.  This is necessary because
>   with cmwq, long running works can be served by regular workqueues,
>   so delaying completion of hot plug/unplugging till certain works are
>   flushed isn't feasible.  So, what becomes necessary is the ability
>   to re-bind tasks which has PF_THREAD_BOUND set but unbound from its
>   now offline cpu which is coming online again.

I'm not at all sure I like that. I'd be perfectly happy with delaying
the hot-unplug.

The whole cpu hotplug mess is tricky enough as it is and I see no
compelling reason to further complicate it. If people are really going
to enqueue strict per-cpu worklets (queue_work_on()) that takes seconds
to complete, then they get to keep the results of that, which includes
slow hot unplug.

Having an off-line cpu still process code like it was online is asking
for trouble, don't go there.

> * cpu_active() test.  CPU activeness encloses cpu online status which
>   is the actual on/offline state.  Workqueues need to keep running
>   while a cpu is going down and with cmwq keeping workqueues running
>   involves creation of new workers (it consists part of
>   forward-progress guarantee and one of cpu down callbacks might end
>   up waiting on completion of certain works).
> 
>   The problem with active state is that during cpu down, active going
>   off doesn't mean all tasks have been migrated off the cpu, so
>   without a migration interface which is synchronized with the actual
>   offline migration, it is difficult to guarantee that all works are
>   either running on the designated cpu if the cpu is online or all
>   work on other cpus if the cpu is offline.
> 
>   Another related problem is that there's no way to monitor the cpu
>   activeness change notifications.

cpu_active() is basically meant for the scheduler to not stick new tasks
on a dying cpu.

So on hot-unplug you'd want to splice your worklets to another cpu,
except maybe those strictly enqueued to the dying cpu, and since there
was work on the dying cpu, you already had a task processing them, so
you don't need new tasks, right?




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/