lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49BADAE5.8070900@cs.columbia.edu>
Date:	Fri, 13 Mar 2009 18:15:01 -0400
From:	Oren Laadan <orenl@...columbia.edu>
To:	Dave Hansen <dave@...ux.vnet.ibm.com>
CC:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-api@...r.kernel.org, containers@...ts.linux-foundation.org,
	mpm@...enic.com, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	tglx@...utronix.de, viro@...iv.linux.org.uk, hpa@...or.com,
	mingo@...e.hu, Sukadev Bhattiprolu <sukadev@...ux.vnet.ibm.com>,
	Alexey Dobriyan <adobriyan@...il.com>, xemul@...nvz.org
Subject: Re: How much of a mess does OpenVZ make? ;) Was: What can OpenVZ
 do?



Dave Hansen wrote:
> On Fri, 2009-03-13 at 14:01 -0700, Linus Torvalds wrote:
>> On Fri, 13 Mar 2009, Alexey Dobriyan wrote:
>>>> Let's face it, we're not going to _ever_ checkpoint any kind of general 
>>>> case process. Just TCP makes that fundamentally impossible in the general 
>>>> case, and there are lots and lots of other cases too (just something as 
>>>> totally _trivial_ as all the files in the filesystem that don't get rolled 
>>>> back).
>>> What do you mean here? Unlinked files?
>> Or modified files, or anything else. "External state" is a pretty damn 
>> wide net. It's not just TCP sequence numbers and another machine.
> 
> This is precisely the reason that we've focused so hard on containers,
> and *didn't* just jump right into checkpoint/restart; we're trying
> really hard to constrain the _truly_ external things that a process can
> interact with.  
> 
> The approach so far has largely been to make things are external to a
> process at least *internal* to a container.  Network, pid, ipc, and uts
> namespaces, for example.  An ipc/sem.c semaphore may be external to a
> process, so we'll just pick the whole namespace up and checkpoint it
> along with the process.
> 
> In the OpenVZ case, they've at least demonstrated that the filesystem
> can be moved largely with rsync.  Unlinked files need some in-kernel TLC
> (or /proc mangling) but it isn't *that* bad.

And in the Zap we have successfully used a log-based filesystem
(specifically NILFS) to continuously snapshot the file-system atomically
with taking a checkpoint, so it can easily branch off past checkpoints,
including the file system.

And unlinked files can be (inefficiently) handled by saving their full
contents with the checkpoint image - it's not a big toll on many apps
(if you exclude Wine and UML...). At least that's a start.

> 
> We can also make the fs problem much easier by using things like dm or
> btrfs snapshotting of the block device, or restricting to where on a fs
> a container is allowed to write with stuff like r/o bind mounts.

(or NILFS)

So we argue that the FS snapshotting is related, but orthogonal in terms
of implementation to c/r.

Oren.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ