linux-kernel - Re: For review: pid_namespaces(7) man page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 1 Mar 2013 10:57:40 +0100
From:	"Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>
To:	Rob Landley <rob@...dley.net>
Cc:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	linux-man <linux-man@...r.kernel.org>,
	Linux Containers <containers@...ts.linux-foundation.org>,
	lkml <linux-kernel@...r.kernel.org>
Subject: Re: For review: pid_namespaces(7) man page

Hi Rob,

On Fri, Mar 1, 2013 at 5:01 AM, Rob Landley <rob@...dley.net> wrote:
> On 02/28/2013 05:24:07 AM, Michael Kerrisk (man-pages) wrote:
[...]

>> DESCRIPTION
>>        For an overview of namespaces, see namespaces(7).
>>
>>        PID  namespaces  isolate  the  process ID number space, meaning
>>        that processes in different PID namespaces can  have  the  same
>>        PID.
>
>
> Um, perhaps "different processes"? Slightly repetitive, but trying to avoid
> the potential misreading that "a processes can have the same PID in
> different namespaces". (A single process can't be a member of more than one
> namespace. This is not about selective visibility.)

I'm not sure this clarifies things...

>> PID namespaces allow containers to migrate to a new host
>>        while the processes inside  the  container  maintain  the  same
>>        PIDs.
>
>
> I thought suspend/resume a container was the simple case. Migration to a new
> host is built on top of that. (On resume in a new container on the same
> system, if other stuff is going on in the system so the available PIDs have
> shifted.)

I'll add some words here on suspend/resume.

>>        Likewise, a process in an ancestor namespace can—subject to the
>>        usual permission checks described in  kill(2)—send  signals  to
>>        the  "init" process of a child PID namespace only if the "init"
>>        process has established a handler for that signal.  (Within the
>>        handler,  the  siginfo_t si_pid field described in sigaction(2)
>>        will be zero.)  SIGKILL or SIGSTOP are  treated  exceptionally:
>>        these signals are forcibly delivered when sent from an ancestor
>>        PID namespace.  Neither of these signals can be caught  by  the
>>        "init" process, and so will result in the usual actions associ‐
>>        ated with those signals (respectively, terminating and stopping
>>        the process).
>
>
> If SIGKILL to init is propogated to all the children of init, is SIGSTOP
> also propogated to all the children? (I.E. will SIGSTOP to container's init
> suspend the whole container, and will SIGCONT resume the whole container? If
> the latter, will it only resume processes that weren't previously stopped?
> :)

Covered by Eric.

>>        To put things another way: a process's PID namespace membership
>>        is determined when the process is created and cannot be changed
>>        thereafter.  Among other things, this means that  the  parental
>>        relationship between processes mirrors the parental between PID
>
>
> mirrors the relationship

Thanks.

>>        namespaces: the parent of a  process  is  either  in  the  same
>>        namespace or resides in the immediate parent PID namespace.
>>
>>        Every  thread  in  a process must be in the same PID namespace.
>>        For this reason, the two following call sequences will fail:
>>
>>            unshare(CLONE_NEWPID);
>>            clone(..., CLONE_VM, ...);    /* Fails */
>>
>>            setns(fd, CLONE_NEWPID);
>>            clone(..., CLONE_VM, ...);    /* Fails */
>
>
> They fail with -EUNDOCUMENTED

Added EINVAL, as per Eric's reply. (Eric does that error also apply
for the two new cases you added?).

>>        Because the above unshare(2) and setns(2) calls only change the
>>        PID  namespace  for created children, the clone(2) calls neces‐
>>        sarily put the new thread in a different PID namespace from the
>>        calling thread.
>
>
> Um, no they don't. They fail. That's the point.

(Good catch.)

> They _would_ put the new
> thread in a different PID namespace, which breaks the definition of threads.
>
> How about:
>
> The above unshare(2) and setns(2) calls change the PID namespace of
> children created by subsequent clone(2) calls, which is incompatible
> with CLONE_VM.

I decided on:

       The  point  here is that unshare(2) and setns(2) change the PID
       namespace for created children but not for the calling process,
       while  clone(2) CLONE_VM specifies the creation of a new thread
       in the same process.

>>    Miscellaneous
>>        After  creating a new PID namespace, it is useful for the child
>>        to change its root directory and mount a new procfs instance at
>>        /proc  so  that  tools such as ps(1) work correctly.  (If a new
>>        mount  namespace  is  simultaneously   created   by   including
>>        CLONE_NEWNS  in  the flags argument of clone(2) or unshare(2)),
>>        then it isn't necessary to change the  root  directory:  a  new
>>        procfs instance can be mounted directly over /proc.)
>
>
> Why is the (If) clause in parentheses? And unshare(2)) has a Bruce.
> (I.E. unbalanced parens.).

I'll make some fixes here.

>>        Calling  readlink(2)  on the path /proc/self yields the process
>>        ID of the caller in the  PID  namespace  of  the  procfs  mount
>>        (i.e.,  the  PID  namespace  of  the  process  that mounted the
>>        procfs).
>
>
> This is per-filesystem rather than using the process's namespace because...?
> (Where /proc/self points is already process-local data, so the races here
> can't be too horrible...)

Explained by Eric.

I'll add:

[[
This can be useful for introspection purposes,
when a process wants to discover its PID in other namespaces.
]]

[...]

>> CONFORMING TO
>>        Namespaces are a Linux-specific feature.
>
>
> And yet the glibc guys insist on #define GNU_GNU_GNU_ALL_HAIL_STALLMAN in
> order to access this Linux-specific feature which has nothing whatsoever to
> do with the FSF.

This is a misunderstanding. _GNU_SOURCE is the standard way to expose
Linux-specific functionality from POSIX header files.

> The unshare() call originally _didn't_ require this define, but they
> retroactively added the requirement in a version "upgrade" to match your man
> page. This made me sad. It also made me prototype it myself rather than
> expecting the header to provide it.

Hmmm. I did not notice that change. Ulrich rejected my early (2007)
request for a change
(http://www.sourceware.org/bugzilla/show_bug.cgi?id=4749) and then
quietly made it later (glibc 2.14, 2011).

Thanks for the review, Rob.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/