linux-kernel - Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.1011201247090.15662@takamine.ncl.cs.columbia.edu>
Date:	Sat, 20 Nov 2010 12:58:07 -0500 (EST)
From:	Oren Laadan <orenl@...columbia.edu>
To:	Tejun Heo <tj@...nel.org>
cc:	Kirill Korotaev <dev@...allels.com>,
	Kapil Arya <kapil@....neu.edu>,
	Pavel Emelianov <xemul@...allels.com>,
	Gene Cooperman <gene@....neu.edu>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Linux Containers <containers@...ts.osdl.org>
Subject: Re: [Ksummit-2010-discuss] checkpoint-restart: naked patch

On Fri, 19 Nov 2010, Tejun Heo wrote:

> Hello,
> 
> On 11/19/2010 03:36 PM, Kirill Korotaev wrote:
> > Can you imagine how many userland APIs are needed to make userspace C/R?
> > 
> > Do you really want APIs in user-space which allow to:
> > - send signals with siginfo attached (kill() doesn't work...)
> 
> Doesn't rt_sigqueueinfo() already do this?
> 

You assume that c/r is done by the checkpointed processes _themselves_,
that is that to checkpoint a process that process need to be made runnable 
and it will save its own state (which is the model of dmtcp, but not of
using ptrace). 

This model is restrictive: it requires that you hijack the execution of
that process somehow and make it run. What if the process isn't runnable
(e.g. in vfork waiting for completion, or ptraced deep in the kernel) ?
letting it run even just a bit may modify its state. It also means that
if you have many processes in the checkpointed session, e.g. 1000, then
_all_ of them will have to be scheduled to run !

With kernel c/r this is unnecessary:  you can use an auxiliary process
to checkpoint other processes without scheduling the other processes.
I.e. it's _transparent_ and _preemptive_.

Another advantage is that if anything fails during checkpoint (for 
whatever reason), there are no side-effects (which is not the case with
the other method).

> > For every small piece of functionality you will need to export ABI
> > and maintain it forever.  It's thousands of APIs! And why the hell
> > they are needed in user space at all?
> 
> I think it's actually quite the contrary.  Most things are already
> visible to userland.  They _have_ to be and that's the reason why
> userland implementation can already get most things working without
> any change to the kernel with some amount of hackery.  To me in-kernel
> CR seems to approach the problem from the exactly wrong direction -
> rather than dealing with specific exceptions, it create a completely
> new framework which is very foreign and not useful outside of CR.
> 
> Also, think about it.  Which one is better?  A kernel which can fully
> show its ABI visible states to userland or one which dumps its
> internal data structurs in binary blobs.  To me, the latter seems
> multiple orders of magnitude uglier.

Are we jusding aesteics ?  To me the former looks uglier...

The amount of fragile hacks you need to go through to make it work
in userspace for the generic cases (including userspace trickery
and new crazy APIs from the kernel for state that was never even an 
ABI, like skb's), and the restrictions it posses simply suggest that 
userspace is not the right place to do it. 

Thanks,

Oren.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/