linux-kernel - Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20081030180133.GN15171@hawkmoon.kerlabs.com>
Date:	Thu, 30 Oct 2008 19:01:33 +0100
From:	Louis Rilling <Louis.Rilling@...labs.com>
To:	Dave Hansen <dave@...ux.vnet.ibm.com>
Cc:	Daniel Lezcano <dlezcano@...ibm.com>,
	containers@...ts.linux-foundation.org,
	Cedric Le Goater <clg@...ibm.com>,
	Andrey Mirkin <major@...nvz.org>, linux-kernel@...r.kernel.org
Subject: Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based
	checkpointing/restart

On Thu, Oct 30, 2008 at 10:08:44AM -0700, Dave Hansen wrote:
> On Thu, 2008-10-30 at 12:47 +0100, Louis Rilling wrote:
> > 1) this prevents userspace from doing weird things, like changing the task tree
> > and let the kernel detect it and deal with the mess this creates (think about
> > two threads being restarted in separate processes that do not even share their
> > parents). But one can argue that userspace can change the checkpoint image as
> > well, so that the kernel must check for such weird things anyway.
> 
> To me, this is one of the strongest arguments out there for doing
> restart as much as possible with existing user<->kernel APIs.  Having
> the kernel detect and clean up userspace's messes is not going to work.
> We might as well just do things in the kernel rather than do that.
> 
> What we *should* do is leverage all of the existing APIs that we already
> have instead of creating completely new code paths into which my butter
> fingers can introduce new kernel bugs.
> 
> > 2) restart will be more efficient with respect to shared objects.
> 
> Can you quantify this?  Which objects?  How much more efficient?

Quantify? No. I expect that investigating both approaches will show us numbers.
Unless Oren already has some?

Which objects? I think that two kinds will especially matter: objects usually
shared only inside a thread group (mm_struct, fs_struct, files_struct,
signal_struct and sighand_struct), and individual file descriptors. The point is
to avoid creating new structures before destroying them because the restarted
task shares them with a previously restarted one.

Concerning individual file descriptors, limiting the number of open files before
calling sys_restart() may avoid these useless creations/destructions (actually
the "useless" work mainly consists in managing ref counts since file descriptors
are shared after fork()).

Concerning thread-shared structures, it is probably easy for userspace to guess
which clone flags to use when restarting threads, but
1) kernel-space will have to check that the sharing is correct anyway, and
2) kernel-space will have to fix it anyway if structures are not shared in an
obvious manner between tasks (think about A creating B with shared files_struct,
B creating C with shared files_struct, B unsharing its files_struct, and then
checkpoint).

So, with a userspace implementation, useless structures will be created anyway,
and optimizing the common cases (regular threads) just duplicates kernel's work
of checking which shared structure to use for each task to restart.
With a kernel-space implementation, all useless creations can be avoided, and no
duplicate work is needed.

That said, numbers may show us that useless creations are not so
time-consuming, but we won't know before seeing them...

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

Download attachment "signature.asc" of type "application/pgp-signature" (190 bytes)