lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 2 Sep 2010 18:54:23 -0700
From:	Colin Cross <ccross@...roid.com>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
Cc:	Alan Stern <stern@...land.harvard.edu>,
	linux-kernel@...r.kernel.org, linux-pm@...ts.linux-foundation.org,
	Pavel Machek <pavel@....cz>, Len Brown <len.brown@...el.com>,
	Greg Kroah-Hartman <gregkh@...e.de>,
	Randy Dunlap <randy.dunlap@...cle.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] PM: Prevent waiting forever on asynchronous resume after abort

On Thu, Sep 2, 2010 at 5:35 PM, Rafael J. Wysocki <rjw@...k.pl> wrote:
> On Friday, September 03, 2010, Colin Cross wrote:
>> On Thu, Sep 2, 2010 at 4:09 PM, Rafael J. Wysocki <rjw@...k.pl> wrote:
>> > On Friday, September 03, 2010, Colin Cross wrote:
>> >> On Thu, Sep 2, 2010 at 2:34 PM, Alan Stern <stern@...land.harvard.edu> wrote:
>> >> > On Thu, 2 Sep 2010, Colin Cross wrote:
>> >> >
>> >> >> That would work, but I still don't see why it's better.  With either
>> >> >> of your changes, the power.completion variable is storing state, and
>> >> >> not just used for notification.  However, the exact meaning of that
>> >> >> state is unclear, especially during the transition from an aborted
>> >> >> suspend to resume, and the state is duplicating power.status.  Setting
>> >> >> it to complete in dpm_prepare is especially confusing, because at that
>> >> >> point nothing is completed, it hasn't even been started.
>> >> >
>> >> > The state being waited for varies from time to time and is only
>> >> > partially related to power.status.  Instead of using a completion I
>> >> > suppose we could have used a new "transition_complete" variable
>> >> > together with a waitqueue.  Would you prefer that?  It's effectively
>> >> > the same thing as a completion, but without the nice packaging already
>> >> > provided by the kernel.
>> >> No, that doesn't change anything.  What I'd prefer to see is a
>> >> wait_for_condition on the desired state of the parent.  As is,
>> >> power.completion means one thing during suspend (the device has
>> >> started, but not finished, suspending), and a different thing during
>> >> resume (the device has not finished resuming, and may not have started
>> >> resuming).  That difference is exactly what caused the bug - the
>> >> completion has to be set on init so that it is set before the device
>> >> starts suspend.
>> >
>> > Not really.  The bug is there, because my analysis of the suspend error code
>> > path was wrong.  Sorry about that, but it has nothing to do with the "different
>> > meaning" of the completions during suspend and resume.
>> >
>> > The completions here are simply used to enforce a specific ordering of
>> > operations, nothing more.  They have no meaning beyond that.
>>
>> The completion variable maintains state.
>
> So what?  Locks also maintain state.
>
>> It has meaning whether or not you want it to.  Leaving it as a completion
>> variable requires that you manage that state, which is difficult considering
>> there is no documentation and no clear idea in the code of exactly when that
>> state is set or clear.
>
> Please run "git show 5af84b82701a96be4b033aaa51d86c72e2ded061" and read the
> changelog.  It's described in there quite clearly (I think).
Yes, that is very clear, sorry I didn't see it before.  A simple
description closer to the code would have helped me.

>> It would be much cleaner to use a wait queue, and use
>> wait_on_condition to wait for the device to be in the desired state.
>
> Well, in fact that was used in one version of the patchset that introduced
> asynchronous suspend-resume, but it was rejected by Linus, because it was
> based on non-standard synchronization.  The Linus' argument, that I agreed
> with, was that standard snychronization constructs, such as locks or
> completions, were guaranteed to work accross different architectures and thus
> were simply _safer_ to use than open-coded synchronization that you seem to be
> preferring.
If I'm reading the right thread, that was using rwlocks, not
conditions.  wait_on_condition looks just as cross-architecture as
completions, and is almost as simple.

I look at it like this:  Are you waiting for something to complete, or
are you waiting for something to be in a specific state?  Completion
works great if you know that you will only want to wait after it
starts.  That's not true for an aborted suspend - you may call
dpm_wait on a device that has never started resuming, because it never
suspended.  You can use a completion, and make sure it's state is
right for all the corner cases, but at the end of the day that's not
what you mean.  What you mean is "wait on the device to be resumed",
and that's a condition, not a simple completion event.

> Completions simply allowed us to get the desired behavior with the least
> effort and that's why we used them.
I'm happy with the end result, but I may submit a patch that converts
the completions to conditions for discussion.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ