lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200704262237.32665.rjw@sisk.pl>
Date:	Thu, 26 Apr 2007 22:37:31 +0200
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	nigel@...el.suspend2.net
Cc:	Pekka Enberg <penberg@...helsinki.fi>, Pavel Machek <pavel@....cz>,
	Dumitru Ciobarcianu <Dumitru.Ciobarcianu@...s.ro>,
	Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Christian Hesse <mail@...thworm.de>,
	Nick Piggin <npiggin@...e.de>, Mike Galbraith <efault@....de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Con Kolivas <kernel@...ivas.org>,
	"suspend2-devel@...ts.suspend2.net" 
	<suspend2-devel@...ts.suspend2.net>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Arjan van de Ven <arjan@...radead.org>
Subject: Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

On Thursday, 26 April 2007 22:16, Nigel Cunningham wrote:
> Hi.
> 
> On Thu, 2007-04-26 at 21:28 +0200, Rafael J. Wysocki wrote:
> > On Thursday, 26 April 2007 18:10, Pekka Enberg wrote:
> > > 
> > > On 4/26/2007, "Rafael J. Wysocki" <rjw@...k.pl> wrote:
> > > > In principle, we could add suspend2 as an alternative (in analogy with the I/O
> > > > schedulers, for example), but I think for this purpose it should be reviewed
> > > > properly.
> > > 
> > > Yeah, this makes sense.
> > > 
> > > On 4/26/2007, "Rafael J. Wysocki" <rjw@...k.pl> wrote:
> > > > There also is a real problem with how it uses the LRU pages.  It _seems_ to
> > > > work, but at least to me it seems to be potentially dangerous.
> > > 
> > > I am new to suspend2 so can you please explain what exactly is dangerous
> > > about it?
> > 
> > After freezing tasks, it first saves the contents of the LRU pages, freezes
> > devices and then uses the LRU pages for storing the suspend image (if more
> > memory is needed, it's allocated, but that's irrelevant here).  Now, we have no
> > warranty that the LRU pages are not updated after we've saved their contents
> > (first potential problem here).
> > 
> > After the image has been created, we have to unfreeze devices and save the
> > image.  Now, we have no warranty that no one will be writing to the LRU pages
> > that we have used to store the image, for whatever reasons known to him, so the
> > image can potentially get corrupted while it's being saved.
> > 
> > In principle, device drivers can do this and there are some kernel threads that
> > also can do this (we don't freeze them, because they're needed for the image
> > saving).
> > 
> > The design is conceptually really really complicated and it makes strong
> > assumptions about the behavior of different subsystems.  While these
> > assumptions _may_ be satisfied right now, we'd have to ensure the satisfaction
> > of them in the future if suspend2 were merged.
> 
> That's a good description of the issue, although I think _may_ and
> _seems_ are stating things a bit more pessimistically than is
> necessary. 

I've used them to express my personal concerns.

> You see, we need to remember that the pages which are saved separately
> are LRU pages. Because userspace is frozen, their contents are going to
> be static. The only possibilities for modifying them come from timer
> routines, improperly frozen filesystems and device drivers.

And kernel threads that we don't freeze deliberately.  Currently, these are
all worker threads, dm-related kernel threads and some others.

> We have code to check that the LRU isn't changing, and I've only seen
> one report of modifications to about 20 LRU pages. I haven't had the
> time yet to chase down the cause, but hope to do so soon.

I didn't say that would be common.  If it had been, you'd have seen problems
with it.  To me the problem is the lack of warranty that it won't happen.

> The general scheme has been working for four or five years - if there
> was a fundamental issue, we would have found it by now.
> 
> The scheme isn't complicated.

Conceptually, it is complicated just because you're using the LRU.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ