lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 11 Nov 2009 14:29:10 +0100
From:	Ferenc Wagner <wferi@...f.hu>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
Cc:	linux-pm@...ts.linux-foundation.org,
	Jesse Barnes <jbarnes@...tuousgeek.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	yakui.zhao@...el.com, LKML <linux-kernel@...r.kernel.org>,
	ACPI Devel Maling List <linux-acpi@...r.kernel.org>,
	Len Brown <lenb@...nel.org>
Subject: Re: [linux-pm] intermittent suspend problem again

"Rafael J. Wysocki" <rjw@...k.pl> writes:

> On Wednesday 11 November 2009, Ferenc Wagner wrote:
>
>> "Rafael J. Wysocki" <rjw@...k.pl> writes:
>> 
>>> On Thursday 29 October 2009, Ferenc Wagner wrote:
>>>
>>>> "Rafael J. Wysocki" <rjw@...k.pl> writes:
>>>> 
>>>>> On Wednesday 28 October 2009, Ferenc Wagner wrote:
>>>>> 
>>>>>> 2.6.32-rc5 feels particularly bad, with frequent failures to switch
>>>>>> off the machine after "S|" or freezes after "Snapshotting system".
>>>>>> The former does not cause much trouble in itself, as the machine can
>>>>>> be switched off and resumed all right, but the latter is nasty.
>>>>>> Suspend to RAM works all the time.  The issue is not reproducible,
>>>>>> unfortunately, and the kernel change happened almost together with a
>>>>>> BIOS upgrade.  Yesterday I switched back to 2.6.31 to see whether it
>>>>>> still works stably with the new BIOS.  I'll report back my findings in
>>>>>> a couple of days.
>>>>>
>>>>> OK, thanks.
>>>>>
>>>>> Still, I'm really afraid we won't be able to debug it any further without a
>>>>> reproducible test case.
>>>>
>>>> Can't you perhaps suggest a way forward there?  Or some tricks to create a
>>>> reproducible test case here?
>>>
>>> Well, you can test if the problem is reproducible in the "shutdown" mode of
>>> hibernation.
>> 
>> Well, both failure modes happen with "shutdown" mode as well (the S|
>> freeze with yesterday's git, too), but still not reproducibly.  When
>> s2disk is stuck in "Snapshotting system", the system is not completely
>> dead, it echoes line feeds and Ctrl-C at least (as added to #14504).
>> 
>> I wonder what you did if the issue was reproducible...  Is that totally
>> unapplicable if the problem happens with 10% probability only?  Slow,
>> sure, but until I manage to set up an automated testing bench...
>
> I would try to identify the commit that made the problem appear using git
> bisection.  However, this is really difficult with problems that are not
> reliably reproducible.

Indeed.  I'm thinking about setting up a script, which does nothing but
hibernates the laptop in a loop, and get my router provide a constant
stream of WOL packets to restart it.  If it always freezes in bounded
time that will make bisecting possible, if slow.

> Failing that, I would add some instrumentation to the code to identify the
> exact place where it hangs.

I managed to achieve this with my STR problem, see
http://bugs.freedesktop.org/show_bug.cgi?id=22126#c17, but maybe that
status = acpi_evaluate_object(NULL, METHOD_NAME__PTS, &arg_list, NULL);
wasn't deep enough, as it got no followup.  How deep should one go to be
useful?

I can probably do so again, if slower; but this case may also be easier
if I can depend on working console output.  Which are the interesting
parts for instrumentation?  Can those parts produce console output to
VGA or netconsole?  Wouldn't switching on ACPI debugging before invoking
s2disk be useful?  Which parts of it (to avoid it spitting out MBs of
useless characters)?

> BTW, did you carry out the /sys/power/pm_test "core" test on the box?

I'm not clear on how to do that with user space suspend.  Simply set it
to "cores" before invoking s2disk?  I already did the test for STR (see
http://bugs.freedesktop.org/show_bug.cgi?id=22126#c3), but will redo
with the current kernel tonight.
-- 
Thanks,
Feri.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists