[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.0912070737150.3560@localhost.localdomain>
Date: Mon, 7 Dec 2009 08:31:39 -0800 (PST)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Alan Stern <stern@...land.harvard.edu>
cc: Zhang Rui <rui.zhang@...el.com>, "Rafael J. Wysocki" <rjw@...k.pl>,
LKML <linux-kernel@...r.kernel.org>,
ACPI Devel Maling List <linux-acpi@...r.kernel.org>,
pm list <linux-pm@...ts.linux-foundation.org>
Subject: Re: [GIT PULL] PM updates for 2.6.33
On Mon, 7 Dec 2009, Alan Stern wrote:
> >
> > I dunno. Maybe I'm overlooking something, but the above is much closer to
> > what I think would be worth doing.
>
> You're overlooking resume. It's more difficult than suspend. The
> issue is that a child can't start its async part until the parent's
> synchronous part is finished.
No, I haven't overlooked resume at all. I just assumed that it was
obvious. It's the exact same thing, except in reverse (the locking ends
up being slightly different, but the changes are actually fairly
straightforward).
And by reverse, I mean that you walk the tree in the reverse order too,
exactly like we already do - on suspend we walk it children-first, on
resume we walk it parents-first (small detail: we actually just walk a
simple linked list, but the list is topologically ordered, so walking it
forwards/backwards is topologically the same thing as doing that
depth-first search).
> So for example, suppose the device listing contains P, C, Q, where C is
> a child of P, Q is unrelated, and P has a long-lasting asynchronous
> requirement. The resume process will stall upon reaching C, waiting
> for P to finish. Thus even though P and Q might be able to resume in
> parallel, they won't get the chance.
No. The resume process does EXCTLY THE SAME THING as I outlined for
suspend, but just all in reverse. So now the resume process becomes the
same two-phase thing:
# Phase one
resume(root)
{
// This can do things asynchronously if it wants,
// but needs to take the write lock on itself until
// it is done if it does
resume_one_node(root);
for_each_child(root)
resume(child);
}
# Phase two
post_resume(root)
{
post_resume_one_node(root);
for_each_child(root)
post_resume(child);
}
Notice? It's _exactly_ the same thing as suspend - except all turned
around. We do the nodes before the children ("walk the list backwards"),
and we also do the locking the other way around (ie on suspend we'd lock
the _parent_ if we wanted to do async stuff - to keep it around - but on
resume we lock _ourselves_, so that the children can have something to
wait on. Also note how we take a _write_ lock rather than a read lock).
(And again, I've only written it out in email, I've not tested it or
thought about it all that deeply, so you'll excuse any stupid thinkos.)
Now, for something like PCI, I'd suggest (once more) leaving all drivers
totally unchanged, and you end up with the exact same behavior as we had
before (no real change to the whole resume ordering, and everything is
synchronous so there is no relevant locking).
But how would the USB layer do this?
Simple: all the normal leaf devices would have their resume callback be
called at "post_resume()" time (exactly the reverse of the suspend phase:
we suspend early, and we resume late - it's all a mirror image). And I'd
suggest that the USB layer do it all totally asynchronously, except again
turned around the other way.
Remember how on suspend, the suspend of a leaf device ended up being an
issue of asynchronously calling a function that did the suspend, and then
released the read-lock of the parent. Resume is the same, except now we'd
actually want to take the parent read-lock asynchronously too, so you'd do
down_write(leaf->lock);
async_schedule(usb_node_resume, leaf);
where that function simply does
usb_node_resume(node)
{
/* Wait for the parent to have resumed completely */
down_read(node->parent->lock);
node->resume(node)
up_read(node->parent->lock);
up_write(node->lock);
}
and you're all done. Once more the ordering and the locking takes care of
any need to serialize - there is no data structures to keep track of.
And what about USB hubs? They get resumed in the first phase (again,
exactly the mirror image of the suspend), and the only thing they need to
do is that _exact_ same thing above:
down_write(hub->lock);
async_schedule(usb_node_resume, hub);
- Ta-daa! All done.
Notice? It's really pretty straightforward, and there are _zero_ new
concepts. And again, no callbacks, no nothing. Just the obvious mirror
image of what happened when suspending. We do everything with simple async
calls. And none of the tree walking actually blocks (yes, we do a
"down_write()" on the nodes as we schedule the resume code, but it won't
be a blocking one, since that is the first time we encounter that node:
the blocking will come later when the async threads actually need to wait
for things).
Again, I do not guarantee that I've dotted every i, and crossed every t.
It's just that I'm pretty sure that we really don't need any fancy
"infrastructure" for something this simple. And I really much prefer
"conceptually simple high-level model" over a model of "keep track of all
the relationships and have some complex model of devices".
So let's just look at your example:
> So for example, suppose the device listing contains P, C, Q, where C is
> a child of P, Q is unrelated, and P has a long-lasting asynchronous
> requirement.
The tree is:
... -> P -> C
-> Q
and with what I suggest, during phase one, P will asynchronously start the
resume. As part of its async resume it will have to wait for it's parents,
of course, but all of that happens in a separate context, and the tree
traversal goes on.
And during phase #1, C and Q won't do anything at all. We _could_ do them
during this phase, and it would actually all work out fine, but we
wouldn't want to do that for a simple reason: we _want_ the pre_suspend
and post_resume phases to be total mirror images, because if we end up
doing error handling for the pre-suspend case, then the post-resume phase
would be the "fixup" for it, so we actually want leaf things to happen
during phase #2 - not because it would screw up locking or ordering, but
because of other issues.
When we hit phase #2, we then do C and Q, and do the same thing - we have
an async call that does the read-lock on the parent to make sure it's
all resumed, and then we resume C and Q. And they'll automatically resume
in parallel (unless C is waiting for P, of course, in which case P and Q
end up resuming in parallel, and C ends up waiting).
Now, the above just takes care of the inter-device ordering. There are
unrelated semantics we want to give, like "all devices will have resumed
before we start waking up user space". Those are unrelated to the
topological requirements, of course, and are not a requirement imposed by
the device tree, but by our _other_ semantics (IOW, in this respect it's
kind of like how we wanted pre-suspend and post-resume to be mirror images
for other outside reasons).
So we'd actually have a "phase #3", but that phase wouldn't be visible to
the devices themselves, it would be a
# Phase tree: make sure everything is resumed
for_each_device() {
read_lock(dev->lock);
read_unlock(dev->lock);
}
but as you can see, there's no actual device callbacks involved. It would
be just the code device layer saying "ok, now I'm going to wait for all
the devices to have finished their resume".
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists