[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1275698169.10045.8.camel@maxim-laptop>
Date: Sat, 05 Jun 2010 03:36:09 +0300
From: Maxim Levitsky <maximlevitsky@...il.com>
To: Nigel Cunningham <ncunningham@...a.org.au>
Cc: Pavel Machek <pavel@....cz>,
pm list <linux-pm@...ts.linux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>,
TuxOnIce-devel <tuxonice-devel@...onice.net>
Subject: Re: [SUSPECTED SPAM] Re: [linux-pm] Proposal for a new algorithm
for reading & writing a hibernation image.
On Sat, 2010-06-05 at 09:58 +1000, Nigel Cunningham wrote:
> Hi Maxim.
>
> On 05/06/10 09:39, Maxim Levitsky wrote:
> > On Thu, 2010-06-03 at 16:50 +0200, Pavel Machek wrote:
> >>
> >> "Nigel Cunningham"<ncunningham@...a.org.au> wrote:
> >>
> >>> Hi.
> >>>
> >>> On 30/05/10 15:25, Pavel Machek wrote:
> >>>> Hi!
> >>>>
> >>>>> 2. Prior to writing any of the image, also set up new 4k page tables
> >>>>> such that an attempt to make a change to any of the pages we're about to
> >>>>> write to disk will result in a page fault, giving us an opportunity to
> >>>>> flag the page as needing an atomic copy later. Once this is done, write
> >>>>> protection for the page can be disabled and the write that caused the
> >>>>> fault allowed to proceed.
> >>>>
> >>>> Tricky.
> >>>>
> >>>> page faulting code touches memory, too...
> >>>
> >>> Yeah. I realise we'd need to make the pages that are used to record the
> >>> faults be unprotected themselves. I'm imagining a bitmap for that.
> >>>
> >>> Do you see any reason that it could be inherently impossible? That's
> >>> what I really want to know before (potentially) wasting time trying it.
> >>
> >> I'm not sure it is impossible, but it certainly seems way too complex to be
> >> practical.
> >>
> >> 2mb pages will probably present a problem, as will bat mappings on powerpc.
> >
> >
> > Some time ago, after tuxonce caused medium fs corruption twice on my
> > root filesystem (superblock gone for example), I was thinking too about
> > how to make it safe to save whole memory.
>
> I'd be asking why you got the corruption. On the odd occasion where it
> has been reported, it's usually been because the person didn't set up
> their initramfs correctly (resumed after mounting filesystems). Is there
> any chance that you did that?
>
> > Your tuxonice is so fast that it resembles suspend to ram.
>
> That depends on hard drive speed and CPU speed. I've just gotten a new
> SSD drive, and can understand your statement now, but I wouldn't have
> said the same beforehand.
Nope, I have a slow laptop drive.
>
> > I have radically different proposal.
> >
> >
> > Lets create a kind of self-contained very small operation system that
> > will know to do just one thing, write the memory to disk.
> >> From now on I am calling this OS, a suspend module.
> > Physically its code can be contained in linux kernel, or loaded as a
> > module.
> >
> >
> > Let see how things will work first:
> >
> > 1. Linux loads the suspend module to memory (if it is inside kernel
> > image, that becomes unnecessary)
> >
> > At that point, its even possible to add some user plug-ins to that
> > module for example to draw splash screen. Of course all such plug-ins
> > must be root approved.
> >
> >
> > 2. Linux turns off all devices, but hard disk.
> > Drivers for hard drives will register for this exception.
> >
> >
> > 3. Linux creates a list of memory areas to save (or exclude from save,
> > doesn't matter)
> >
> > 4. Linux creates a list of hard disk sectors that will contain the
> > image.
> > This ensures support for swap partition and swap files as well.
> >
> > 5. Linux allocates small 'scratch space'
> > Of course if memory is very tight, some swapping can happen, but that
> > isn't significant.
> >
> >
> > 6. Linux creates new page tables that cover: the suspend module, both of
> > above lists, scratch space, and (optionally) the framebuffer RW,
> > and rest of memory RO.
> >
> > 7. Linux switches to new page table, and passes control to that module.
> > Even if the module wanted to it won't be able to change system memory.
> > It won't even know how to do so.
> >
> > 8. Module optionally encrypts and/or compresses (and saves result to
> > scratch page)
> >
> > 9. Module uses very simplified disk drivers to write the memory to disk.
> > These drivers can even omit using interrupts because there is nothing
> > else to do.
> > It can also draw progress bar on framebuffer using optional plugin
> >
> > 10. Module passes control back to linux, which just shuts system off.
>
> Sounds a lot like kexec based hibernation that was suggested a year or
> two back. Have you thought about resuming, too? That's the trickier part
> of the process.
Why its tricky?
We can just reseve say 25 MB of memory and make resuming kernel only use
it for all its needs.
>
> > Now what code will be in the module:
> >
> > 1. Optional compression& encryption - easy
> > 2. Draw modules, also optional and easy
> >
> >
> > 3. New disk drivers.
> > This is the hard part, but if we cover libata and ahci, we will cover
> > the common case.
> > Other cases can be handled by existing code that saved 1/2 of ram.
>
> To my mind, supporting only some hardware isn't an option.
>
> > 4. Arch specific code. Since it doesn't deal with interrupts nor memory
> > managment, it won't be lot of code.
> > Again standard swsusp can be used for arches that that module wasn't
> > ported to.
>
> Perhaps I'm being a pessimist, but it sounds to me like this is going to
> be a way bigger project than you're allowing for.
I also thinks so. This is just an idea.
To add a comment on your idea.
I think is is possible to use page faults to see which memory regions
changed. Actually its is very interesting idea.
You just need to install your own page fault handler, and make sure it
doesn't touch any memory.
Of course the sucky part will be how to edit the page tables.
You might need to write your own code to do so to be sure.
And this has to be arch specific.
Since userspace is frozen, you can be sure that faults can only be
caused by access to WO memory or kernel bugs.
Best regards,
Maxim Levitsky
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists