linux-kernel - Re: Back to the future.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.0.98.0704260941310.9964@woody.linux-foundation.org>
Date:	Thu, 26 Apr 2007 09:56:58 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Nigel Cunningham <nigel@...el.suspend2.net>
cc:	Pekka Enberg <penberg@...helsinki.fi>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: Back to the future.

On Thu, 26 Apr 2007, Nigel Cunningham wrote:
> 
> * Doing things in the right order? (Prepare the image, then do the
> atomic copy, then save).

I'd actually like to discuss this a bit..

I'm obviously not a huge fan of the whole user/kernel level split and 
interfaces, but I actually do think that there is *one* split that makes 
sense:

 - generate the (whole) snapshot image entirely inside the kernel

 - do nothing else (ie no IO at all), and just export it as a single image 
   to user space (literally just mapping the pages into user space). 
   *one* interface. None of the "pretty UI update" crap. Just a single 
   system call:

	void *snapshot_system(u32 *size);

   which will map in the snapshot, return the mapped address and the size 
   (and if you want to support snapshots > 4GB, be my guest, but I suspect 
   you're actually *better* off just admitting that if you cannot shrink 
   the snapshot to less than 32 bits, it's not worth doing)

User space gets a fully running system, with that one process having that 
one image mapped into its address space. It can then compress/write/do 
whatever to that snapshot.

You need one other system call, of course, which is

	int resume_snapshot(void *snapshot, u32 size);

and for testing, you should be able to basically do

	u32 size;
	void *buffer = snapshot_system(&size);
	if (buffer != MAP_FAILED)
		resume_snapshot(buffer, size);

and it should obviously work.

And btw, the device model changes are a big part of this. Because I don't 
think it's even remotely debuggable with the full suspend/resume of the 
devices being part of generating the image! That freeze/snapshot/unfreeze 
sequence is likely a lot more debuggable, if only because freeze/unfreeze 
is actually a no-op for most devices, and snapshotting is trivial too.

Once you have that snapshot image in user space you can do anything you 
want. And again: you'd hav a fully working system: not any degradation 
*at*all*. If you're in X, then X will continue running etc even after the 
snapshotting, although obviously the snapshotting will have tried to page 
a lot of stuff out in order to make the snapshot smaller, so you'll likely 
be crawling.

> * Mulithreaded I/O (might as well use multiple cores to compress the
> image, now that we're hotplugging later).
> * Support for > 1 swap device.
> * Support for ordinary files.
> * Full image option.
> * Modular design?

I'd really suggest _just_ the "full image". Nothing else is probably ever 
worth supporting. Your "snapshot to disk" wouldn't be _quite_ as simple as 
"echo disk > /sys/power/state", but it should not necessarily be much 
worse than

	snapshot_kernel | gzip -9 > /dev/snapshot

either (and resuming from the snapshot would just be the reverse)!

And if you want to send the snapshot over a TCP connection to another 
host, be my guest. With pretty images while it's transferring. Whatever.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/