[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090804214038.GA5141@count0.beaverton.ibm.com>
Date: Tue, 4 Aug 2009 14:40:38 -0700
From: Matt Helsley <matthltc@...ibm.com>
To: Paul Menage <menage@...gle.com>
Cc: "Serge E. Hallyn" <serue@...ibm.com>,
Benjamin Blum <bblum@...gle.com>,
containers@...ts.linux-foundation.org, akpm@...ux-foundation.org,
linux-kernel@...r.kernel.org, Rafael Wysocki <rjw@...k.pl>,
Linux Power Management <linux-pm@...ts.linux-foundation.org>
Subject: Re: [PATCH 6/6] Makes procs file writable to move all threads by
tgid at once
[ Cc'ing Rafael and linux-pm for more eyes on proposed freezer usage. ]
On Mon, Aug 03, 2009 at 12:55:33PM -0700, Paul Menage wrote:
> On Mon, Aug 3, 2009 at 12:45 PM, Serge E. Hallyn<serue@...ibm.com> wrote:
> >
> > This is probably a stupid idea, but... what about having zero
> > overhead at clone(), and instead, at cgroup_task_migrate(),
> > dequeue_task()ing all of the affected threads for the duration of
> > the migrate?
>
> That doesn't sound too unreasonable, actually - it would certainly
> simplify things a fair bit. Is there a standard API for doing that?
I'm all for simplifying cgroup locking. I doubt anybody's against
it, given the "right" simplification.
I'm not sure if the freezer is actually the right thing to
use for this though. Perhaps CFS/scheduler folks could advise?
> dequeue_task() itself doesn't really look like a public API. I guess
> that the task freezer would be one way to accomplish this?
The freezer won't actually remove the task from the runqueue -- just
cause it to go into a schedule() loop until it's thawed.
[ Incidentally, sorry if this is a dumb question, but why don't frozen
tasks go onto a special wait queue rather than loop around schedule() ?
At least for the cgroup freezer I can imagine keeping the wait queue
with the cgroup subsystem... ]
The freezer sends a fake signal to the task which will interrupt syscalls
and userspace to handle the signal. So all of the frozen tasks would be
looping around schedule() just inside the syscall entry layer "handling"
the fake signal until they are thawed.
This could interrupt a read of the cgroup pidlist for example.
I don't think it's 100% reliable -- vfork-ing tasks could delay freezing
the task "indefinitely" if the vfork'ing userspace tasks are
clueless/malicious.
However the signaling code used there uses kick_process() which may be
needed for this idea.
So if I understand correctly it goes something like:
for each thread
dequeue from runqueue onto ?what?
kick thread (I think this should ensure that the thread is no longer
"current" on any CPU since we dequeued..)
<seems we'd need something to ensure that the previous operations on each
thread have "completed" as far as all other cpus are concerned...>
for each thread
cgroup migrate
for each thread
enqueue back on runqueue from ?what? (is this still the right
queue?)
Cheers,
-Matt Helsley
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists