netdev - Re: [RFC][PATCH] ns: Syscalls for better namespace sharing control.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4B8AE8C1.1030305@free.fr>
Date:	Sun, 28 Feb 2010 23:05:53 +0100
From:	Daniel Lezcano <daniel.lezcano@...e.fr>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
CC:	Pavel Emelyanov <xemul@...allels.com>, hadi@...erus.ca,
	Patrick McHardy <kaber@...sh.net>,
	Linux Netdev List <netdev@...r.kernel.org>,
	containers@...ts.linux-foundation.org,
	Netfilter Development Mailinglist 
	<netfilter-devel@...r.kernel.org>,
	Ben Greear <greearb@...delatech.com>,
	Serge Hallyn <serue@...ibm.com>,
	Matt Helsley <matthltc@...ibm.com>
Subject: Re: [RFC][PATCH] ns: Syscalls for better namespace sharing control.

Eric W. Biederman wrote:
> Pavel Emelyanov <xemul@...allels.com> writes:
> 
>> Eric W. Biederman wrote:
>>> Pavel Emelyanov <xemul@...allels.com> writes:
>>>
>>>> Eric W. Biederman wrote:
>>>>> Pavel Emelyanov <xemul@...allels.com> writes:
>>>>>
>>>>>> Thanks. What's the problem with setns?
>>>>> joining a preexisting namespace is roughly the same problem as
>>>>> unsharing a namespace.  We simply haven't figure out how to do it
>>>>> safely for the pid and the uid namespaces.
>>>> The pid may change after this for sure. What problems do you know
>>>> about it? What if we try to allocate the same PID in a new space
>>>> or return -EBUSY? This will be a good starting point. If we manage
>>>> to fix it later this will not break the API at all.
>>> Parentage.  The pid is the identity of a process and all kinds of things
>>> make assumptions in all kinds of strange places.  I don't see how
>>> waitpid can work if you change the pid.
>> Agree. But what if we enter a pid space, which is a subnamespace of a current
>> one? In that case parent will still see the task by its old pid. We can restrict
>> first version of entering with this rule as well and this restriction will not
>> block us in typical usecase (I mean enter a container from a host).
> 
> When I was thinking about pid namespaces and unshare last time.  The idea I came
> to was we unshare of the pid namespace should only affect which pid namespace
> your children are in.
> 
> I remember that do that there were a few cases where you would have to access
> task->pid->pid_ns instead of task->nsproxy->pid_ns, but essentially it was pretty
> simple.
> 
>>> glibc doesn't cope if you change someones pid.
>> OK, but what if we try to allocate the same pid returning -EBUSY on failure?
>>
>> My aim is to provide even a restricted enter. For most of the cases this
>> should work and make our lives easier. So two restrictions currently:
>> a) enter a sub namespace
>> b) allocate the same pid as we have now
>>
>> Hm? :)
> 
> Replacing struct pid is guaranteed to do all kinds of nasty things with
> signal handling and the like, de_thread is nasty enough and you are talking
> something worse.  So if we can change pid namespaces without changing
> the pid I am for it.

I agree with all the points you and Pavel you talked about but I don't 
feel comfortable to have the current process to switch the pid namespace 
because of the process tree hierarchy (what will be the parent of the 
process when you enter the pid namespace for example). What is the 
difference with the sys_bindns or the sys_hijack, proposed a couple of 
years ago ?

I did a suggestion some weeks ago about a new syscall 'cloneat' where 
the child process becomes the child of the targeted process specified in 
the syscall. Maybe it would be interesting to replace the 'setns' by, or 
add, a 'cloneat' syscall with the file descriptor passed as parameter. 
The copy_process function shall not use the nsproxy of the caller but 
the one provided in the fd argument.

The newly created process becomes the child of the process where we 
retrieve the namespace with nsfd and this one have to 'waitpid' it, (the 
caller of 'cloneat' can not wait it). It's a bit similar with the 
CLONE_PARENT flag, except the creation order is inverted (the father 
creates for the child).

So when entering the container, we specify the pid 1 of the container 
which is usually a child reaper.

Does it make sense ?

Thanks
   -- Daniel




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html