lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 23 Feb 2009 15:45:25 +0100
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	pm list <linux-pm@...ts.linux-foundation.org>,
	Len Brown <lenb@...nel.org>,
	Jesse Barnes <jbarnes@...tuousgeek.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during suspend-resume

On Monday 23 February 2009, Ingo Molnar wrote:
> 
> * Eric W. Biederman <ebiederm@...ssion.com> wrote:
> 
> > > What makes s2ram fragile is not human failure but the 
> > > combination of a handful of physical property:
> > >
> > > 1) Psychology: shutting the lid or pushing the suspend button is 
> > >    a deceivingly 'simple' action to the user. But under the 
> > >    hood, a ton of stuff happens: we deinitialize a lot of 
> > >    things, we go through _all hardware state_, and we do so in a 
> > >    serial fashion. If just one piece fails to do the right 
> > >    thing, the box might not resume. Still, the user expects this 
> > >    'simple' thing to just work, all the time. No excuses 
> > >    accepted.
> > >
> > > 2) Length of code: To get a successful s2ram sequence the kernel
> > >    runs through tens of thousands of lines of code. Code which
> > >    never gets executed on a normal box - only if we s2ram. If 
> > >    just one step fails, we get a hung box.
> > >
> > > 3) Debuggability: a lot of s2ram code runs with the console off, 
> > >    making any bugs hard to debug. Furthermore we have no 
> > >    meaningful persistent storage either for kernel bug messages. 
> > >    The RTC trick of PM_DEBUG works but is a very narrow channel 
> > >    of information and it takes a lot of time to debug a bug via 
> > >    that method.
> > 
> > Yep that is an issue.
> 
> I'd also like to add #4:
> 
>      4) One more thing that makes s2ram special is that when the 
>         resume path finds hardware often in an even more 
>         deinitialized form than during normal bootup. During
>         normal bootup the BIOS/firmware has at least done some
>         minimal bootstrap (to get the kernel loaded), which
>         makes life easier for the kernel.
> 
>         At s2ram stage we've got a completely pure hardware
>         init state, with very minimal firmware activation.

This is very true and at least in some cases done on purpose, AFAICS, due to
some timing constraints forced on HW vendors by M$, for example.

>         So many of the init and deinit problems and bugs we only 
>         hit in the s2ram path - which dynamics is again not 
>         helpful.

Plus ACPI requires us to do additional things during suspend-resume that
are not done on boot-shutdown and which have their own ordering requirements
(not necessarily stated directly, but such that we have do discover
experimentally).  That also change from one BIOS to another.

> > > The combination of these factors really makes up for a 
> > > perfect storm in terms of kernel technology: we have this 
> > > very-deceivingly-simple-looking but 
> > > complex-and-rarely-executed piece of code, which is very 
> > > hard to debug.
> > 
> > And much of this as you are finding with this piece of code is 
> > how the software was designed rather then how the software 
> > needed to be.
> 
> Well most of the 4 problems above are externalities and cannot 
> go away just by fixing the kernel.
> 
>  #1 will always be with us.
>  #3 needs the hardware to change. It's happening, but slowly.
>  #4 will be with us as long as there's non-Linux BIOSes
> 
> #2 is the only thing where we can make a realistic difference,
> but there's just so much we can do there.
> 
> And that still leaves the other three items: each of which is 
> powerful enough of a force to give a bad name to any normal 
> subsystem.

Agreed.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ