[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTi=gANgv=_PFTqNKy1vDddhYKZJ9nOKhvhOBJwqE@mail.gmail.com>
Date: Fri, 3 Sep 2010 09:48:00 -0700
From: Colin Cross <ccross@...roid.com>
To: Alan Stern <stern@...land.harvard.edu>
Cc: "Rafael J. Wysocki" <rjw@...k.pl>, linux-kernel@...r.kernel.org,
linux-pm@...ts.linux-foundation.org, Pavel Machek <pavel@....cz>,
Len Brown <len.brown@...el.com>,
Greg Kroah-Hartman <gregkh@...e.de>,
Randy Dunlap <randy.dunlap@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort
On Fri, Sep 3, 2010 at 7:04 AM, Alan Stern <stern@...land.harvard.edu> wrote:
> On Thu, 2 Sep 2010, Colin Cross wrote:
>
>> You're right, wait_event would be much worse.
>>
>> I think there's another race condition during suspend. If an
>> asynchronous device calls device_pm_wait_for_dev on a device that
>> hasn't had device_suspend called on it yet, power.completion will
>> still be set from initialization or the last time it completed resume,
>> and it won't wait.
>
> That can't happen in a properly-designed system. It would mean the
> async device didn't suspend because it was waiting for a device which
> was registered before it -- and that would deadlock even if you used
> synchronous suspend.
I see - from the earlier thread, if devices need to break the tree
model for suspend, they still have to follow the list ordering.
>> Assuming that problem is fixed somehow, there's also a deadlock
>> possibility. Consider 3 devices. A, B, and C, registered in that
>> order. A is async, and the suspend handler calls
>> device_pm_wait_for_dev(C). B's suspend handler returns an error. A's
>> suspend handler is now stuck waiting on C->power.completion, but
>> device_suspend(C) will never be called.
>
> Why not? The normal suspend order is last-to-first, so C will be
> suspended before B.
Reverse A and C, but then the earlier comment applies.
>> There are also an unhandled edge condition - what is the expected
>> behavior for a call to device_pm_wait_for_dev on a device if the
>> suspend handler for that device returns an error? Currently, the
>> calling device will continue as if the target device had suspended.
>
> It looks like __device_suspend needs to set async_error. Which means
> async_suspend doesn't need to set it. This is indeed a bug.
Is this sufficient? The waiting device will complete its suspend
handler, and then be resumed, but the waited-on device never
suspended. Are drivers expected to handle that case?
>> What about splitting power.completion into two flags,
>> power.suspend_complete and power.resume_complete?
>> power.resume_complete is initialized to 1, because the devices start
>> resumed. Clear power.suspend_complete for all devices at the
>> beginning of dpm_suspend, and clear power.resume_complete for any
>> device that is suspended at the beginning of dpm_resume. The
>> semantics of each flag is then always clear. Any time between the
>> beginning and end of dpm_suspend, waiting on any device's
>> power.suspend_complete will block until that device is in suspend.
>> Any time between the beginning and end of dpm_resume, waiting on
>> power.resume_complete will block IFF the device is suspended.
>
> How are you going to wait for these things? With wait_event? Didn't
> you say above that it would be worse than using completions?
I'd have to complete them on errors to abort any waiters, which would
complicate the meaning of the completion flags, but keep the rules
above about the same.
>> A solution to the 2nd and 3rd problems would still be needed - a way
>> to abort drivers that call device_pm_wait_for_dev when suspend is
>> aborted, and a return value to tell them the device being waited on is
>> not suspended.
>
> No solutions are needed. See above.
>
> Alan Stern
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists