[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200623115521.hk3xlhixrt2zrgkn@wittgenstein>
Date: Tue, 23 Jun 2020 13:55:21 +0200
From: Christian Brauner <christian.brauner@...ntu.com>
To: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
x86@...nel.org, Dmitry Safonov <dima@...sta.com>,
Andrei Vagin <avagin@...il.com>
Cc: Will Deacon <will@...nel.org>,
Vincenzo Frascino <vincenzo.frascino@....com>,
Thomas Gleixner <tglx@...utronix.de>,
Serge Hallyn <serge@...lyn.com>,
Michael Kerrisk <mtk.manpages@...il.com>,
Andy Lutomirski <luto@...nel.org>,
Catalin Marinas <catalin.marinas@....com>,
Mark Rutland <mark.rutland@....com>, adrian@...as.de
Subject: Re: [PATCH 3/3] nsproxy: support CLONE_NEWTIME with setns()
On Fri, Jun 19, 2020 at 05:35:59PM +0200, Christian Brauner wrote:
> So far setns() was missing time namespace support. This was partially due
> to it simply not being implemented but also because vdso_join_timens()
> could still fail which made switching to multiple namespaces atomically
> problematic. This is now fixed so support CLONE_NEWTIME with setns()
>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Michael Kerrisk <mtk.manpages@...il.com>
> Cc: Serge Hallyn <serge@...lyn.com>
> Cc: Dmitry Safonov <dima@...sta.com>
> Cc: Andrei Vagin <avagin@...il.com>
> Signed-off-by: Christian Brauner <christian.brauner@...ntu.com>
> ---
Andrei,
Dmitry,
A little off-topic since its not related to the patch here but I've been
going through the current time namespace semantics and i just want to
confirm something with you:
Afaict, unshare(CLONE_NEWTIME) currently works similar to
unshare(CLONE_NEWPID) in that it only changes {pid,time}_for_children
but does _not_ change the {pid, time} namespace of the caller itself.
For pid namespaces that makes a lot of sense but I'm not completely
clear why you're doing this for time namespaces, especially since the
setns() behavior for CLONE_NEWPID and CLONE_NEWTIME is very different:
Similar to unshare(CLONE_NEWPID), setns(CLONE_NEWPID) doesn't change the
pid namespace of the caller itself, it only changes it for it's
children by setting up pid_for_children. _But_ for setns(CLONE_NEWTIME)
both the caller's and the children's time namespace is changed, i.e.
unshare(CLONE_NEWTIME) behaves different from setns(CLONE_NEWTIME). Why?
This also has the consequence that the unshare(CLONE_NEWTIME) +
setns(CLONE_NEWTIME) sequence can be used to change the callers pid
namespace. Is this intended?
Here's some code where you can verify this (please excuse the aweful
code I'm using to illustrate this):
int main(int argc, char *argv[])
{
char buf1[4096], buf2[4096];
if (unshare(0x00000080))
exit(1);
int fd = open("/proc/self/ns/time", O_RDONLY);
if (fd < 0)
exit(2);
readlink("/proc/self/ns/time", buf1, sizeof(buf1));
readlink("/proc/self/ns/time_for_children", buf2, sizeof(buf2));
printf("unshare(CLONE_NEWTIME): time(%s) ~= time_for_children(%s)\n", buf1, buf2);
if (setns(fd, 0x00000080))
exit(3);
readlink("/proc/self/ns/time", buf1, sizeof(buf1));
readlink("/proc/self/ns/time_for_children", buf2, sizeof(buf2));
printf("setns(self, CLONE_NEWTIME): time(%s) == time_for_children(%s)\n", buf1, buf2);
exit(EXIT_SUCCESS);
}
which gives:
root@...vm:/# ./test
unshare(CLONE_NEWTIME): time(time:[4026531834]) ~= time_for_children(time:[4026532366])
setns(self, CLONE_NEWTIME): time(time:[4026531834]) == time_for_children(time:[4026531834])
why is unshare(CLONE_NEWTIME) blocked from changing the callers pid
namespace when setns(CLONE_NEWTIME) is allowed to do this?
Christian
Powered by blists - more mailing lists