netdev - Re: [RFC][PATCH] ns: Syscalls for better namespace sharing control.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <m1aaulyy5c.fsf@fess.ebiederm.org>
Date:	Sat, 06 Mar 2010 12:48:31 -0800
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Daniel Lezcano <daniel.lezcano@...e.fr>
Cc:	Pavel Emelyanov <xemul@...allels.com>,
	Sukadev Bhattiprolu <sukadev@...ux.vnet.ibm.com>,
	Serge Hallyn <serue@...ibm.com>,
	Linux Netdev List <netdev@...r.kernel.org>,
	containers@...ts.linux-foundation.org,
	Netfilter Development Mailinglist 
	<netfilter-devel@...r.kernel.org>,
	Ben Greear <greearb@...delatech.com>
Subject: Re: [RFC][PATCH] ns: Syscalls for better namespace sharing control.

Daniel Lezcano <daniel.lezcano@...e.fr> writes:

> Eric W. Biederman wrote:

> If the normal rules of parentage apply, that means pid 0 has to wait it's child.
> If we are in the scenario of pid 0, it's child pid 1234 and we kill the pid 1 of
> the pid namespace, I suppose pid 1234 will be killed too.
> The pid 0 will stay in the pid namespace and will able to fork again a new pid
> 1.
>
> I think Serge already reported that...
>
> That sounds good :)

I expect zap_pid_ns_processes should also arrange so we cannot allocate any
more processes.  We certainly need to do something explicit or pid 1 won't
be allocated.  It might make sense to resurrect a pid namespace after it's
death but it is definitely weird.

>> In a lot of ways I like this idea of sys_hijack/sys_cloneat, and I
>> don't think anything I am doing fundamentally undermines it.  The use
>> case of doing things in fork is that there is automatic inheritance of
>> everything.  All of the namespaces and all of the control groups, and
>> possibly also the parent process.  
> And also the rootfs for executing the command inside the container
> (eg. shutdown), the uid/gid (if there is a user namespace), the mount points,
> ...
> But I suppose we can do the same with setns for all the namespaces and chrooting
> within the container rootfs.
>
> What I see is a problem with the tty. For example, we cloneat the init process
> of the container which is usually /sbin/init but this one has its tty mapped to
> /dev/console, so the output of the exec'ed command will go to the console.

My original thinking was that the fd's would come from the caller of sys_cloneat....

>> Overall it sounds like the semantics I have proposed with
>> unshare(CLONE_NEWPID) are workable, and simple to implement.  The
>> extra fork is a bit surprising but it certainly does not
>> look like a show stopper for implementing a pid namespace join.
>>   
> I agree, it's some kind of "ghost" process.
> IMO, with a bit of userspace code it would be possible to enter or exec a
> command inside a container with nsfd, setns.
>
> +1 to test your patchset Eric :)

I will see about reposting sometime soon.

> Just a mindless suggestion, the "nsopen" / "nsattach" syscall names should be
> more clear no ?

Not bad suggestions.

I am going to explore a bit more.  Given that nsfd is using the same
permission checks as a proc file, I think I can just make it a proc
file.  Something like "/proc/<pid>/ns/net".  With a little luck that
won't suck too badly.

> Jumping back, one question about the nsfd and the poll for waiting the end of
> the namespace.
> If we have an openened file descriptor on a specific namespace, we grab a
> reference on this one, so the namespace won't be destroyed until we close the fd
> which is used to poll the end of the namespace, no ? Did I miss something ?

Not really.  The assumption was that there would be a very similar
file descriptor that we could use with poll.

Eric

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html