[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d5ecde0c94014a4fad090e44377e9852@EXMBDFT11.ad.twosigma.com>
Date: Wed, 27 May 2020 18:05:55 +0000
From: Nicolas Viennot <Nicolas.Viennot@...sigma.com>
To: Christian Brauner <christian.brauner@...ntu.com>,
Adrian Reber <areber@...hat.com>
CC: "Eric W. Biederman" <ebiederm@...ssion.com>,
Casey Schaufler <casey@...aufler-ca.com>,
Pavel Emelyanov <ovzxemul@...il.com>,
Oleg Nesterov <oleg@...hat.com>,
Dmitry Safonov <0x7f454c46@...il.com>,
Andrei Vagin <avagin@...il.com>,
Michał Cłapiński <mclapinski@...gle.com>,
Kamil Yurtsever <kyurtsever@...gle.com>,
"Dirk Petersen" <dipeit@...il.com>,
Christine Flood <chf@...hat.com>,
Mike Rapoport <rppt@...ux.ibm.com>,
Radostin Stoyanov <rstoyanov1@...il.com>,
"Cyrill Gorcunov" <gorcunov@...nvz.org>,
Serge Hallyn <serge@...lyn.com>,
"Stephen Smalley" <stephen.smalley.work@...il.com>,
Sargun Dhillon <sargun@...gun.me>,
Arnd Bergmann <arnd@...db.de>,
"linux-security-module@...r.kernel.org"
<linux-security-module@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"selinux@...r.kernel.org" <selinux@...r.kernel.org>,
Eric Paris <eparis@...isplace.org>,
Jann Horn <jannh@...gle.com>
Subject: RE: [PATCH] capabilities: Introduce CAP_RESTORE
> > Also in this thread Kamil mentioned that they also need calling prctl
> > with PR_SET_MM during restore in their production setup.
>
> We're using that as well but it really feels like this:
>
> prctl_map = (struct prctl_mm_map){
> .start_code = start_code,
> .end_code = end_code,
> .start_stack = start_stack,
> .start_data = start_data,
> .end_data = end_data,
> .start_brk = start_brk,
> .brk = brk_val,
> .arg_start = arg_start,
> .arg_end = arg_end,
> .env_start = env_start,
> .env_end = env_end,
> .auxv = NULL,
> .auxv_size = 0,
> .exe_fd = -1,
> };
>
> should belong under ns_capable(CAP_SYS_ADMIN). Why is that necessary to relax?
When the prctl(PR_SET_MM_MAP...), the only privileged operation is to change the symlink of /proc/self/exe via set_mm_exe_file().
See https://github.com/torvalds/linux/blob/444fc5cde64330661bf59944c43844e7d4c2ccd8/kernel/sys.c#L2001-L2004
It needs CAP_SYS_ADMIN of the current namespace.
I would argue that setting the current process exe file check should just be reduced to a "can you ptrace a children" check.
Here's why: any process can masquerade into another executable with ptrace.
One can fork a child, ptrace it, have the child execve("target_exe"), then replace its memory content with an arbitrary program.
With CRIU's libcompel parasite mechanism (https://criu.org/Compel) this is fairly easy to implement.
In fact, we could modify CRIU to do just that (but with a fair amount of efforts due to the way CRIU is written),
and not rely on being able to SET_MM_EXE_FILE via prctl(). In turn, that would give an easy way to masquerade any process
into another one, provided that one can ptrace a child.
When not using PR_SET_MM_MAP, but using SET_MM_EXE_FILE, the CAP_RESOURCES at the root namespace level is required:
https://github.com/torvalds/linux/blob/444fc5cde64330661bf59944c43844e7d4c2ccd8/kernel/sys.c#L2109
This seems inconsistent. Also for some reason changing auxv is not privileged if using prctl via the MM_MAP mechanism, but is privileged otherwise.
Powered by blists - more mailing lists