linux-kernel - Re: For review: rewritten pivot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7f2907f5-442a-6187-d0ad-e2fd345cd450@gmail.com>
Date:   Wed, 9 Oct 2019 23:01:33 +0200
From:   "Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     mtk.manpages@...il.com, Philipp Wendler <ml@...lippwendler.de>,
        "Serge E. Hallyn" <serge@...lyn.com>,
        Christian Brauner <christian@...uner.io>,
        Aleksa Sarai <asarai@...e.de>,
        Reid Priedhorsky <reidpr@...l.gov>,
        Andy Lutomirski <luto@...capital.net>,
        Yang Bo <rslovers@...dex.com>, Jakub Wilk <jwilk@...lk.net>,
        Joseph Sible <josephcsible@...il.com>,
        Al Viro <viro@....linux.org.uk>, werner@...esberger.net,
        linux-man <linux-man@...r.kernel.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Containers <containers@...ts.linux-foundation.org>,
        Stéphane Graber <stgraber@...ntu.com>
Subject: Re: For review: rewritten pivot_root(2) manual page

Hello Eric,

On 10/9/19 6:00 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@...il.com> writes:
> 
>> Hello Eric,
>>
>> Thank you. I was hoping you might jump in on this thread.
>>
>> Please see below.
>>
>> On 10/9/19 10:46 AM, Eric W. Biederman wrote:
>>> "Michael Kerrisk (man-pages)" <mtk.manpages@...il.com> writes:
>>>
>>>> Hello Philipp,
>>>>
>>>> My apologies that it has taken a while to reply. (I had been hoping
>>>> and waiting that a few more people might weigh in on this thread.)
>>>>
>>>> On 9/23/19 3:42 PM, Philipp Wendler wrote:
>>>>> Hello Michael,
>>>>>
>>>>> Am 23.09.19 um 14:04 schrieb Michael Kerrisk (man-pages):
>>>>>
>>>>>> I'm considering to rewrite these pieces to exactly
>>>>>> describe what the system call does (which I already
>>>>>> do in the third paragraph) and remove the "may or may not"
>>>>>> pieces in the second paragraph. I'd welcome comments
>>>>>> on making that change.
>>
>> What did you think about my proposal above? To put it in context,
>> this was my initial comment in the mail:
>>
>> [[
>> One area of the page that I'm still not really happy with
>> is the "vague" wording in the second paragraph and the note
>> in the third paragraph about the system call possibly
>> changing. These pieces survive (in somewhat modified form)
>> from the original page, which was written before the
>> system call was released, and it seems there was some
>> question about whether the system call might still change
>> its behavior with respect to the root directory and current
>> working directory of other processes. However, after 19
>> years, nothing has changed, and surely it will not in the
>> future, since that would constitute an ABI breakage.
>> I'm considering to rewrite these pieces to exactly
>> describe what the system call does (which I already
>> do in the third paragraph) and remove the "may or may not"
>> pieces in the second paragraph. I'd welcome comments
>> on making that change.
>> ]]
>>
>> And the second and third paragraphs of the manual page currently
>> read:
>>
>> [[
>>        pivot_root()  may  or may not change the current root and the cur‐
>>        rent working directory of any processes or threads  that  use  the
>>        old  root  directory  and which are in the same mount namespace as
>>        the caller of pivot_root().  The  caller  of  pivot_root()  should
>>        ensure  that  processes  with root or current working directory at
>>        the old root operate correctly in either case.   An  easy  way  to
>>        ensure  this is to change their root and current working directory
>>        to  new_root  before  invoking  pivot_root().   Note   also   that
>>        pivot_root()  may  or may not affect the calling process's current
>>        working directory.  It is therefore recommended to call chdir("/")
>>        immediately after pivot_root().
>>
>>        The  paragraph  above  is  intentionally vague because at the time
>>        when pivot_root() was first implemented, it  was  unclear  whether
>>        its  affect  on  other process's root and current working directo‐
>>        ries—and the caller's current working  directory—might  change  in
>>        the  future.   However, the behavior has remained consistent since
>>        this system call was first implemented: pivot_root()  changes  the
>>        root  directory  and the current working directory of each process
>>        or thread in the same mount namespace to new_root if they point to
>>        the  old  root  directory.   (See also NOTES.)  On the other hand,
>>        pivot_root() does not change the caller's current  working  direc‐
>>        tory  (unless it is on the old root directory), and thus it should
>>        be followed by a chdir("/") call.
>> ]]
> 
> Apologies I saw that concern I didn't realize it was a questio
> 
> I think it is very reasonable to remove warning the behavior might
> change.  We have pivot_root(8) in common use that to use it requires
> the semantic of changing processes other than the current process.
> Which means any attempt to noticably change the behavior of
> pivot_root(2) will break userspace.

Thanks for the confirmation that this change would be okay.
I will make this change soon, unless I hear a counterargument.

> Now the documented semantics in behavior above are not quite what
> pivot_root(2) does.  It walks all processes on the system and if the
> working directory or the root directory refer to the root mount that is
> being replaced, then pivot_root(2) will update them.
> 
> In practice the above is limited to a mount namespace.  But something as
> simple as "cd /proc/<somepid>/root" can allow a process to have a
> working directory in a different mount namespace.

So, I'm not quite clear. Do you mean that something in the existing
manual page text should change? If so, could you describe the
needed change please?

> Because ``unprivileged'' users can now use pivot_root(2) we may want to
> rethink the implementation at some point to be cheaper than a global
> process walk.  So far that process walk has not been a problem in
> practice.
> 
> If we had to write pivot_root(2) from scratch limiting it to just
> changing the root directory of the process that calls pivot_root(2)
> would have been the superior semantic.  That would have required run
> pivot_root(8) like: "exec pivot_root . . -- /bin/bash ..."  but it would
> not have required walking every thread in the system.

Okay.

[...]

>>>>>> DESCRIPTION
>>>>> [...]>        pivot_root()  changes  the
>>>>>>        root  directory  and the current working directory of each process
>>>>>>        or thread in the same mount namespace to new_root if they point to
>>>>>>        the  old  root  directory.   (See also NOTES.)  On the other hand,
>>>>>>        pivot_root() does not change the caller's current  working  direc‐
>>>>>>        tory  (unless it is on the old root directory), and thus it should
>>>>>>        be followed by a chdir("/") call.
>>>>>
>>>>> There is a contradiction here with the NOTES (cf. below).
>>>>
>>>> See below.
>>>>
>>>>>>        The following restrictions apply:
>>>>>>
>>>>>>        -  new_root and put_old must be directories.
>>>>>>
>>>>>>        -  new_root and put_old must not be on the same filesystem as  the
>>>>>>           current root.  In particular, new_root can't be "/" (but can be
>>>>>>           a bind mounted directory on the current root filesystem).
>>>>>
>>>>> Wouldn't "must not be on the same mountpoint" or something similar be
>>>>> more clear, at least for new_root? The note in parentheses indicates
>>>>> that new_root can actually be on the same filesystem as the current
>>>>> note. However, ...
>>>>
>>>> For 'put_old', it really is "filesystem".
>>>
>>> If we are going to be pedantic "filesystem" is really the wrong concept
>>> here.  The section about bind mount clarifies it, but I wonder if there
>>> is a better term.
>>
>> Thanks. My aim was to try to distinguish "mount point" from
>> "a mount somewhere inside the file system associated with a
>> certain mount point"--in other words, I wanted to make it clear
>> that 'put_old' (and 'new_root') could not be subdirectories
>> under the current root mount point (which is correct, right?).
>>
>> Using "mount" does seem better. (My only concern is that some
>> people may take it to mean "the mount point", but perhaps that
>> just my own confusion.)
> 
> I am open to better terms.  But mount or vfsmount is what we are using
> internal to the kernel and is really a distinct concept from filesystem.
> And it is starting to leak out in system calls like move_mount.

I have no better term to propose.

[...]

>> Thanks for the above comments.
>>
>> Hmm, doI need to make similar changes in the initial paragraph of
>> the manual page as well? It currently reads:
>>
>>        pivot_root() changes the root filesystem in the mount namespace of
>>        the calling process.  More precisely, it moves the root filesystem
>>        to  the directory put_old and makes new_root the new root filesys‐
>>        tem.  The calling process must have the  CAP_SYS_ADMIN  capability
>>        in the user namespace that owns the caller's mount namespace.
>>
>> Furthermore the one line NAME of the man page reads:
>>
>>        pivot_root - change the root filesystem
>>
>> Is a change needed there also?
> 
> Yes please.  Both locations.

Okay. So would the following be okay:

[[
NAME
       pivot_root - change the root mount
...
DESCRIPTION
       pivot_root()  changes the root mount in the mount namespace of the
       calling process.  More precisely, it moves the root mount  to  the
       directory  put_old  and  makes  new_root  the new root mount.  The
       calling process must have the CAP_SYS_ADMIN capability in the user
       namespace that owns the caller's mount namespace.
]]

?

[...]

>>>>>>        -  new_root must be a mount point.  (If  it  is  not  otherwise  a
>>>>>>           mount  point,  it  suffices  to  bind  mount new_root on top of
>>>>>>           itself.)
>>>>>
>>>>> ... this item actually makes the above item almost redundant regarding
>>>>> new_root (except for the "/") case. So one could replace this item with
>>>>> something like this:
>>>>>
>>>>> - new_root must be a mount point different from "/". (If it is not
>>>>>   otherwise a mount point, it suffices  to bind mount new_root on top
>>>>>   of itself.)
>>>>>
>>>>> The above item would then only mention put_old (and maybe use clarified
>>>>> wording on whether actually a different file system is necessary for
>>>>> put_old or whether a different mount point is enough).
>>>>
>>>> Thanks. That's a good suggestion. I simplified the earlier bullet
>>>> point as you suggested, and changed the text here to say:
>>>>
>>>>        -  new_root must be a mount point, but can't be "/".  If it is not
>>>>           otherwise  a mount point, it suffices to bind mount new_root on
>>>>           top of itself.  (new_root can be a bind  mounted  directory  on
>>>>           the current root filesystem.)
>>>
>>> How about:
>>>           - new_root must be the path to a mount, but can't be "/".  Any
>>
>> Surely here it must be "mount point" not "mount"? (See my discussion
>> above.)
> 
> Sigh.  I have had my head in the code to long, where new_root is
> used to refer to the mount that is mounted on that mount point as well.

Okay -- so I made the text here:

       -  new_root  must be a path to a mount point, but can't be "/".  A
          path that is not already a mount point can  be  converted  into
          one by bind mounting the path onto itself.

Okay?

[...]

Thanks, Eric. As always, your input for the man pages is so
valuable. (My only challenge is to keep up with you...)

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/