linux-kernel - Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49E4E4AB.1030803@cs.columbia.edu>
Date:	Tue, 14 Apr 2009 15:31:55 -0400
From:	Oren Laadan <orenl@...columbia.edu>
To:	Alexey Dobriyan <adobriyan@...il.com>
CC:	Dave Hansen <dave@...ux.vnet.ibm.com>, akpm@...ux-foundation.org,
	containers@...ts.linux-foundation.org, xemul@...allels.com,
	serue@...ibm.com, mingo@...e.hu, hch@...radead.org,
	torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style



Alexey Dobriyan wrote:
> On Tue, Apr 14, 2009 at 02:08:21PM -0400, Oren Laadan wrote:
>>
>> Alexey Dobriyan wrote:
>>> On Tue, Apr 14, 2009 at 12:26:50AM -0400, Oren Laadan wrote:
>>>> Alexey Dobriyan wrote:
>>>>> On Thu, Apr 09, 2009 at 10:07:11PM -0700, Dave Hansen wrote:
>>>>>> I'm curious how you see these fitting in with the work that we've been
>>>>>> doing with Oren.  Do you mean to just start a discussion or are you
>>>>>> really proposing these as an alternative to what Oren has been posting?
>>>>> Yes, this is posted as alternative.
>>>>>
>>>>> Some design decisions are seen as incorrect from here like:
>>>> A definition of "design" would help; I find most of your comments
>>>> below either vague, cryptic, or technical nits...
>>>>
>>>>> * not rejecting checkpoint with possible "leaks" from container
>>>> ...like this, for example.
>>> Like checkpointing one process out of many living together.
>> See the thread on creating tasks in userspace vs. kernel space:
>> the argument here is that is an interesting enough use case for
>> a checkpoint of not-an-entire-container.
>>
>> Of course it will require more logic to it, so the user can choose
>> what she cares or does not care about, and the kernel could alert
>> the user about it.
>>
>> The point is, that it is, IMHO, a desirable capability.
>>
>>> If you allow this you consequently drop checks (e.g. refcount checks)
>>> for "somebody else is using structure to be checkpointed".
>>>
>> From this point below, I totally agree with you that for the purpose
>> of a whole-container-checkpoint this is certainly desirable. My point
>> was that it can be easily added the existing patchset (not yours).
>> Why not add it there ?
>>
>>> If you drop these checks, you can't decipher legal sutiations like
>>> "process genuinely doesn't care about routing table of netns it lives in"
>>> from "illegal" situations like "process created shm segment but currently
>>> doesn't use it so not checkpointing ipcns will result in breakagenlater".
>>>
>>> You'll have to move responsibility to user, so user exactly knows what
>>> app relies on and on what. And probably add flags like CKPT_SHM,
>>> CKPT_NETNS_ROUTE ad infinitum.
>>>
>>> And user will screw it badly and complain: "after restart my app
>>> segfaulted". And user himself is screwed now: old running process is
>>> already killed (it was checkpointed on purpose) and new process in image
>>> segfaults every time it's restarted.
>>>
>>> All of this in out opinion results in doing C/R unreliably and badly.
>>>
>>> We are going to do it well and dig from the other side.
>>>
>>> If "leak" (any "leak") is detected, C/R is aborted because kernel
>>> doesn't know what app relies on and what app doesn't care about.
>>>
>>> This protected from situations and failure modes described above.
>>>
>>> This also protects to some extent from in-kernel changes where C/R code
>>> should have been updated but wasn't. Person doing incomplete change won't
>>> notice e.g refcount checks and won't try to "fix" them. But we'll notice it,
>>> e.g. when running testsuite (amen) and update C/R code accordingly.
>>>
>>> I'm talking about these checks so that everyone understands:
>>>
>>> 	for_each_cr_object(ctx, obj, CR_CTX_MM_STRUCT) {
>>>                 struct mm_struct *mm = obj->o_obj;
>>>                 unsigned int cnt = atomic_read(&mm->mm_users);
>>>
>>>                 if (obj->o_count != cnt) {
>>>                         printk("%s: mm_struct %p has external references %lu:%u\n", __func__, mm, obj->o_count, cnt);
>>>                         return -EINVAL;
>>>                 }
>>>         }
>>>
>>> They are like moving detectors, small, invisible, something moved, you don't
>>> know what, but you don't care because you have to investigate anyway.
>>>
>>> In this scheme, if user wants to checkpoint just one process, he should
>>> start it alone in separate container. Right now, in posted patchset
>>> as cloned process with
>>> CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWNET
>> So you suggest that to checkpoint a single process, say a cpu job that
>> would run a week, which runs in the topmost pid_ns, I will need to
>> checkpoint the entire topmost pid_ns (as a container, if at all possible
>> - surely there will non-checkpointable tasks there) and then in
>> user-space filter out the data and leave only one task, and then to
>> restart I'll use a container again ?
> 
> No, you do little preparations and start CPU job in container from the very
> beginning.

So you are denying all those other users that don't want to do that
the joy of checkpointing and restarting their stuff ... :(

Or, for users who do run everything in container, but some task is not
checkpointable - it is using this electronic microscope device attached
to their handheld. Alas, they do want to checkpoint that useful program
they are running there that calculates fibonacci numbers ...

Or, a nested container that shares something with the parent container,
so is not checkpointable by itself...

Ok, you probably got the idea.

Oren.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/