lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200709202345.01682.rjw@sisk.pl>
Date:	Thu, 20 Sep 2007 23:45:00 +0200
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, Jaroslav Kysela <perex@...e.cz>,
	Takashi Iwai <tiwai@...e.de>,
	linux-usb-devel@...ts.sourceforge.net,
	Venkatesh Pallipadi <venkatesh.pallipadi@...el.com>,
	Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...l.org>, miklos@...redi.hu
Subject: Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING

Thomas,

On Thursday, 20 September 2007 23:08, Thomas Gleixner wrote:
> Rafael,
> 
> On Thu, 2007-09-20 at 22:39 +0200, Rafael J. Wysocki wrote:
> > > Works as well. What's the difference between this and the real thing ?
> > 
> > The real thing also calls device_power_down(PMSG_FREEZE), which is a
> > counterpart of sysdev_shutdown(), more or less, and I think that's what goes
> > belly up.
> > 
> > You can use the patch below (on top of -rc6-mm1), which just disables the image
> > creation (that should be irrelevant anyway) and see what happens.
> 
> In meantime I figured out what's happening. The ordering in
> hibernate_snapshot() is wrong. It does:
> 
> 	swsusp_shrink_memory();
>         suspend_console();
>         device_suspend(PMSG_FREEZE);
>         platform_prepare(platform_mode);
> 
> 	disable_nonboot_cpus();
> 
>         swsusp_suspend();
> 
> 	enable_nonboot_cpus();
> 
> 	platform_finish(platform_mode);
>         device_resume();
>         resume_console();
> 
> We disable everything in device_suspend()

No, we don't.  sysdevs are _not_ suspended in device_suspend().
They are suspended in device_power_down(), which is called
_after_ disable_nonboot_cpus() (from swsusp_suspend()).

> including timekeeping,

No, the timekeeping is suspended in device_power_down() (or at least it should
be).

> so any  code which is depending on working timekeeping and timer
> functionality (which is suspended in timekeeping_suspend() as well) is
> busted. 
> 
> enable_nonboot_cpus() definitely relies on working timekeeping and
> timers depending on the codepath. It's just a surprise that this did not
> blow up earlier (also before clock events).
> 
> I changed the ordering of the above to:
> 
> 	disable_nonboot_cpus();
> 
> 	swsusp_shrink_memory();
>         suspend_console();
>         device_suspend(PMSG_FREEZE);
>         platform_prepare(platform_mode);
>         swsusp_suspend();
> 	platform_finish(platform_mode);
>         device_resume();
>         resume_console();
> 
> 	enable_nonboot_cpus();

Actually, we can't do this here, because of ACPI and some interrupt handling
related problems.  Unfortunately, platform_finish() needs to go _after_
enable_nonboot_cpus() and device_resume() needs to go after platform_finish().
Analogously, disable_nonboot_cpus() has to go after platform_prepare().

Otherwise, some systems will break.

> and non-surprisingly the "my VAIO needs help from keyboard" problem went
> away immediately. See patch below. (on top of rc7-hrt1, -mm1 does not
> work at all on my VAIO due to some yet not identified wreckage)

Hm, I really don't know why it helps, but that's not because of the timekeeping
suspend, IMO.

> I did not yet look into the suspend to ram code, but I guess that there
> is an equivalent problem.

Yes, the code ordering is the same, but it's not totally wrong, IMHO.

> But I have no idea why this affects Andrews jinxed VAIO (UP machine),
> though I suspect that we have more timekeeping/timer depending code
> somewhere waiting to bite us.

That's possible.

> Also I still need to debug why the HIBERNATION_TEST code path (which has
> a msleep(5000) in it) does not fail,

See above. :-)

> but I postpone this until tomorrow morning. I'm dead tired after hunting
> this Heisenbug which changes with every other printk added to the code.
> I'm going to add some really noisy messages for everything which accesses
> timekeeping / timers _after_ those systems have been shut down.
> 
> We really need to fix this once and forever _before_ 2.6.23 final, even
> if it requires a -rc8.

Agreed.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ