lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1190322532.3085.42.camel@chaos>
Date:	Thu, 20 Sep 2007 23:08:52 +0200
From:	Thomas Gleixner <tglx@...utronix.de>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, Jaroslav Kysela <perex@...e.cz>,
	Takashi Iwai <tiwai@...e.de>,
	linux-usb-devel@...ts.sourceforge.net,
	Venkatesh Pallipadi <venkatesh.pallipadi@...el.com>,
	Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...l.org>, miklos@...redi.hu
Subject: Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when
	booted, USB-related WARNING

Rafael,

On Thu, 2007-09-20 at 22:39 +0200, Rafael J. Wysocki wrote:
> > Works as well. What's the difference between this and the real thing ?
> 
> The real thing also calls device_power_down(PMSG_FREEZE), which is a
> counterpart of sysdev_shutdown(), more or less, and I think that's what goes
> belly up.
> 
> You can use the patch below (on top of -rc6-mm1), which just disables the image
> creation (that should be irrelevant anyway) and see what happens.

In meantime I figured out what's happening. The ordering in
hibernate_snapshot() is wrong. It does:

	swsusp_shrink_memory();
        suspend_console();
        device_suspend(PMSG_FREEZE);
        platform_prepare(platform_mode);

	disable_nonboot_cpus();

        swsusp_suspend();

	enable_nonboot_cpus();

	platform_finish(platform_mode);
        device_resume();
        resume_console();

We disable everything in device_suspend() including timekeeping, so any
code which is depending on working timekeeping and timer functionality
(which is suspended in timekeeping_suspend() as well) is busted.

enable_nonboot_cpus() definitely relies on working timekeeping and
timers depending on the codepath. It's just a surprise that this did not
blow up earlier (also before clock events).

I changed the ordering of the above to:

	disable_nonboot_cpus();

	swsusp_shrink_memory();
        suspend_console();
        device_suspend(PMSG_FREEZE);
        platform_prepare(platform_mode);
        swsusp_suspend();
	platform_finish(platform_mode);
        device_resume();
        resume_console();

	enable_nonboot_cpus();

and non-surprisingly the "my VAIO needs help from keyboard" problem went
away immediately. See patch below. (on top of rc7-hrt1, -mm1 does not
work at all on my VAIO due to some yet not identified wreckage)

I did not yet look into the suspend to ram code, but I guess that there
is an equivalent problem.

But I have no idea why this affects Andrews jinxed VAIO (UP machine),
though I suspect that we have more timekeeping/timer depending code
somewhere waiting to bite us.

Also I still need to debug why the HIBERNATION_TEST code path (which has
a msleep(5000) in it) does not fail, but I postpone this until tomorrow
morning. I'm dead tired after hunting this Heisenbug which changes with
every other printk added to the code. I'm going to add some really noisy
messages for everything which accesses timekeeping / timers _after_
those systems have been shut down.

We really need to fix this once and forever _before_ 2.6.23 final, even
if it requires a -rc8.

Thanks,

	tglx

--- a/kernel/power/disk.c	2007-09-11 09:25:24.000000000 +0200
+++ b/kernel/power/disk.c	2007-09-20 22:47:30.000000000 +0200
@@ -130,10 +130,14 @@ int hibernation_snapshot(int platform_mo
 {
 	int error;
 
+	error = disable_nonboot_cpus();
+	if (error)
+		goto resume_cpus;
+
 	/* Free memory before shutting down devices. */
 	error = swsusp_shrink_memory();
 	if (error)
-		return error;
+		goto resume_cpus;
 
 	suspend_console();
 	error = device_suspend(PMSG_FREEZE);
@@ -144,23 +148,22 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		goto Resume_devices;
 
-	error = disable_nonboot_cpus();
-	if (!error) {
-		if (hibernation_mode != HIBERNATION_TEST) {
-			in_suspend = 1;
-			error = swsusp_suspend();
-			/* Control returns here after successful restore */
-		} else {
-			printk("swsusp debug: Waiting for 5 seconds.\n");
-			mdelay(5000);
-		}
+	if (hibernation_mode != HIBERNATION_TEST) {
+		in_suspend = 1;
+		error = swsusp_suspend();
+		/* Control returns here after successful restore */
+	} else {
+		printk("swsusp debug: Waiting for 5 seconds.\n");
+		mdelay(5000);
 	}
-	enable_nonboot_cpus();
+
  Resume_devices:
 	platform_finish(platform_mode);
 	device_resume();
  Resume_console:
 	resume_console();
+resume_cpus:
+	enable_nonboot_cpus();
 	return error;
 }
 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ