lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87y2y6j9i1.fsf@x220.int.ebiederm.org>
Date:   Mon, 30 Sep 2019 06:42:30 -0500
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     "Michael Kerrisk \(man-pages\)" <mtk.manpages@...il.com>
Cc:     Christian Brauner <christian.brauner@...ntu.com>,
        linux-man <linux-man@...r.kernel.org>,
        Containers <containers@...ts.linux-foundation.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Andy Lutomirski <luto@...capital.net>,
        Jordan Ogas <jogas@...l.gov>, werner@...esberger.net,
        Al Viro <viro@....linux.org.uk>
Subject: Re: pivot_root(".", ".") and the fchdir() dance

"Michael Kerrisk (man-pages)" <mtk.manpages@...il.com> writes:

> Hello Eric,
>
> A ping on my question below. Could you take a look please?
>
> Thanks,
>
> Michael
>
>>>>> The concern from our conversation at the container mini-summit was that
>>>>> there is a pathology if in your initial mount namespace all of the
>>>>> mounts are marked MS_SHARED like systemd does (and is almost necessary
>>>>> if you are going to use mount propagation), that if new_root itself
>>>>> is MS_SHARED then unmounting the old_root could propagate.
>>>>>
>>>>> So I believe the desired sequence is:
>>>>>
>>>>>>>>            chdir(new_root);
>>>>> +++            mount("", ".", MS_SLAVE | MS_REC, NULL);
>>>>>>>>            pivot_root(".", ".");
>>>>>>>>            umount2(".", MNT_DETACH);
>>>>>
>>>>> The change to new new_root could be either MS_SLAVE or MS_PRIVATE.  So
>>>>> long as it is not MS_SHARED the mount won't propagate back to the
>>>>> parent mount namespace.
>>>>
>>>> Thanks. I made that change.
>>>
>>> For what it is worth.  The sequence above without the change in mount
>>> attributes will fail if it is necessary to change the mount attributes
>>> as "." is both put_old as well as new_root.
>>>
>>> When I initially suggested the change I saw "." was new_root and forgot
>>> "." was also put_old.  So I thought there was a silent danger without
>>> that sequence.
>> 
>> So, now I am a little confused by the comments you added here. Do you
>> now mean that the 
>> 
>> mount("", ".", MS_SLAVE | MS_REC, NULL);
>> 
>> call is not actually necessary?

Apologies for being slow getting back to you.

To my knowledge there are two cases where pivot_root is used.
- In the initial mount namespace from a ramdisk when mounting root.
  This is the original use case and somewhat historical as rootfs
  (aka an initial ramfs) may not be unmounted.

- When setting up a new mount namespace to jettison all of the mounts
  you don't need.

The sequence:

	chdir(new_root);
        pivot_root(".", ".");
        umount2(".", MNT_DETACH);

is perfect for both use cases (as nothing needs to be known about the
directory layout of the new root filesystem).

In the case when you are setting up a new mount namespace propogating
changes in the mount layout to another mount namespace is fatal.  But
that is not a concern for using that pivot_root sequence above because
pivot_root will fail deterministically if
'mount("", ".", MS_SLAVE | MS_REC, NULL)' is needed but not specified.

So I would document the above sequence of three system calls in the
man-page.

I would document that pivot_root will fail if propagation would occur.

I would document in pivot_root or under unshare(CLONE_NEWNS) that if
mount propagation is enabled (the default with systemd) that you
need to call 'mount("", "/", MS_SLAVE | MS_REC, NULL);' or
'mount("", "/", MS_PRIVATE | MS_REC, NULL);' after creating a mount
namespace.  Or mounts will propagate backwards, which is usually
not what people want.

Creating of a mount namespace in a user namespace automatically does
'mount("", "/", MS_SLAVE | MS_REC, NULL);' if the starting mount
namespace was not created in that user namespace.  AKA creating
a mount namespace in a user namespace does the unshare for you.

Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ