[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87y2y6j9i1.fsf@x220.int.ebiederm.org>
Date: Mon, 30 Sep 2019 06:42:30 -0500
From: ebiederm@...ssion.com (Eric W. Biederman)
To: "Michael Kerrisk \(man-pages\)" <mtk.manpages@...il.com>
Cc: Christian Brauner <christian.brauner@...ntu.com>,
linux-man <linux-man@...r.kernel.org>,
Containers <containers@...ts.linux-foundation.org>,
lkml <linux-kernel@...r.kernel.org>,
Andy Lutomirski <luto@...capital.net>,
Jordan Ogas <jogas@...l.gov>, werner@...esberger.net,
Al Viro <viro@....linux.org.uk>
Subject: Re: pivot_root(".", ".") and the fchdir() dance
"Michael Kerrisk (man-pages)" <mtk.manpages@...il.com> writes:
> Hello Eric,
>
> A ping on my question below. Could you take a look please?
>
> Thanks,
>
> Michael
>
>>>>> The concern from our conversation at the container mini-summit was that
>>>>> there is a pathology if in your initial mount namespace all of the
>>>>> mounts are marked MS_SHARED like systemd does (and is almost necessary
>>>>> if you are going to use mount propagation), that if new_root itself
>>>>> is MS_SHARED then unmounting the old_root could propagate.
>>>>>
>>>>> So I believe the desired sequence is:
>>>>>
>>>>>>>> chdir(new_root);
>>>>> +++ mount("", ".", MS_SLAVE | MS_REC, NULL);
>>>>>>>> pivot_root(".", ".");
>>>>>>>> umount2(".", MNT_DETACH);
>>>>>
>>>>> The change to new new_root could be either MS_SLAVE or MS_PRIVATE. So
>>>>> long as it is not MS_SHARED the mount won't propagate back to the
>>>>> parent mount namespace.
>>>>
>>>> Thanks. I made that change.
>>>
>>> For what it is worth. The sequence above without the change in mount
>>> attributes will fail if it is necessary to change the mount attributes
>>> as "." is both put_old as well as new_root.
>>>
>>> When I initially suggested the change I saw "." was new_root and forgot
>>> "." was also put_old. So I thought there was a silent danger without
>>> that sequence.
>>
>> So, now I am a little confused by the comments you added here. Do you
>> now mean that the
>>
>> mount("", ".", MS_SLAVE | MS_REC, NULL);
>>
>> call is not actually necessary?
Apologies for being slow getting back to you.
To my knowledge there are two cases where pivot_root is used.
- In the initial mount namespace from a ramdisk when mounting root.
This is the original use case and somewhat historical as rootfs
(aka an initial ramfs) may not be unmounted.
- When setting up a new mount namespace to jettison all of the mounts
you don't need.
The sequence:
chdir(new_root);
pivot_root(".", ".");
umount2(".", MNT_DETACH);
is perfect for both use cases (as nothing needs to be known about the
directory layout of the new root filesystem).
In the case when you are setting up a new mount namespace propogating
changes in the mount layout to another mount namespace is fatal. But
that is not a concern for using that pivot_root sequence above because
pivot_root will fail deterministically if
'mount("", ".", MS_SLAVE | MS_REC, NULL)' is needed but not specified.
So I would document the above sequence of three system calls in the
man-page.
I would document that pivot_root will fail if propagation would occur.
I would document in pivot_root or under unshare(CLONE_NEWNS) that if
mount propagation is enabled (the default with systemd) that you
need to call 'mount("", "/", MS_SLAVE | MS_REC, NULL);' or
'mount("", "/", MS_PRIVATE | MS_REC, NULL);' after creating a mount
namespace. Or mounts will propagate backwards, which is usually
not what people want.
Creating of a mount namespace in a user namespace automatically does
'mount("", "/", MS_SLAVE | MS_REC, NULL);' if the starting mount
namespace was not created in that user namespace. AKA creating
a mount namespace in a user namespace does the unshare for you.
Eric
Powered by blists - more mailing lists