[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgNAkjTYu53ji=gP2qXRYpvUEdAP=gxg0BR40JJ54z+XBha-A@mail.gmail.com>
Date: Tue, 18 Oct 2016 10:17:39 +0200
From: "Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>
To: John Stultz <john.stultz@...aro.org>
Cc: Andy Lutomirski <luto@...capital.net>,
lkml <linux-kernel@...r.kernel.org>, Tejun Heo <tj@...nel.org>,
Li Zefan <lizefan@...wei.com>,
Jonathan Corbet <corbet@....net>,
"open list:CONTROL GROUP (CGROUP)" <cgroups@...r.kernel.org>,
Android Kernel Team <kernel-team@...roid.com>,
Rom Lemarchand <romlem@...roid.com>,
Colin Cross <ccross@...roid.com>,
Dmitry Shmidt <dimitrysh@...gle.com>,
Ricky Zhou <rickyz@...omium.org>,
Dmitry Torokhov <dmitry.torokhov@...il.com>,
Todd Kjos <tkjos@...gle.com>,
Christian Poetzsch <christian.potzsch@...tec.com>,
Amit Pundir <amit.pundir@...aro.org>,
"Serge E . Hallyn" <serge@...lyn.com>,
Linux API <linux-api@...r.kernel.org>
Subject: Re: [PATCH] cgroup: Add new capability to allow a process to migrate
other tasks between cgroups
Hi John,
On 18 October 2016 at 01:35, John Stultz <john.stultz@...aro.org> wrote:
> On Mon, Oct 17, 2016 at 3:40 PM, Andy Lutomirski <luto@...capital.net> wrote:
>> On Mon, Oct 17, 2016 at 3:35 PM, John Stultz <john.stultz@...aro.org> wrote:
>>> This patch adds CAP_GROUP_MIGRATE and logic to allows a process
>>> to migrate other tasks between cgroups.
>>>
>>> In Android (where this feature originated), the ActivityManager tracks
>>> various application states (TOP_APP, FOREGROUND, BACKGROUND, SYSTEM,
>>> etc), and then as applications change states, the SchedPolicy logic
>>> will migrate the application tasks between different cgroups used
>>> to control the different application states (for example, there is a
>>> background cpuset cgroup which can limit background tasks to stay
>>> on one low-power cpu, and the bg_non_interactive cpuctrl cgroup can
>>> then further limit those background tasks to a small percentage of
>>> that one cpu's cpu time).
>>>
>>> However, for security reasons, Android doesn't want to make the
>>> system_server (the process that runs the ActivityManager and
>>> SchedPolicy logic), run as root. So in the Android common.git
>>> kernel, they have some logic to allow cgroups to loosen their
>>> permissions so CAP_SYS_NICE tasks can migrate other tasks between
>>> cgroups.
>>>
>>> The approach taken there overloads CAP_SYS_NICE a bit much, and
>>> is maybe more complicated then needed.
>>>
>>> So this patch, as suggested by Tejun, simply adds a new process
>>> capability flag (CAP_CGROUP_MIGRATE), and uses it when checking
>>> if a task can migrate other tasks between cgroups.
>>>
>>> I've tested this with AOSP master (though its a bit hacked in as I
>>> still need to properly get the selinux bits aware of the new
>>> capability bit) with selinux set to permissive and it seems to be
>>> working well.
>>>
>>> Thoughts and feedback would be appreciated!
>>>
>>> Cc: Tejun Heo <tj@...nel.org>
>>> Cc: Li Zefan <lizefan@...wei.com>
>>> Cc: Jonathan Corbet <corbet@....net>
>>> Cc: cgroups@...r.kernel.org
>>> Cc: Android Kernel Team <kernel-team@...roid.com>
>>> Cc: Rom Lemarchand <romlem@...roid.com>
>>> Cc: Colin Cross <ccross@...roid.com>
>>> Cc: Dmitry Shmidt <dimitrysh@...gle.com>
>>> Cc: Ricky Zhou <rickyz@...omium.org>
>>> Cc: Dmitry Torokhov <dmitry.torokhov@...il.com>
>>> Cc: Todd Kjos <tkjos@...gle.com>
>>> Cc: Christian Poetzsch <christian.potzsch@...tec.com>
>>> Cc: Amit Pundir <amit.pundir@...aro.org>
>>> Cc: Serge E. Hallyn <serge@...lyn.com>
>>> Cc: linux-api@...r.kernel.org
>>> Signed-off-by: John Stultz <john.stultz@...aro.org>
>>> ---
>>> v2: Renamed to just CAP_CGROUP_MIGRATE as reccomended by Tejun
>>> ---
>>> include/uapi/linux/capability.h | 5 ++++-
>>> kernel/cgroup.c | 3 ++-
>>> 2 files changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
>>> index 49bc062..44d7ff4 100644
>>> --- a/include/uapi/linux/capability.h
>>> +++ b/include/uapi/linux/capability.h
>>> @@ -349,8 +349,11 @@ struct vfs_cap_data {
>>>
>>> #define CAP_AUDIT_READ 37
>>>
>>> +/* Allow migrating tasks between cgroups */
>>>
>>> -#define CAP_LAST_CAP CAP_AUDIT_READ
>>> +#define CAP_CGROUP_MIGRATE 38
>>> +
>>> +#define CAP_LAST_CAP CAP_CGROUP_MIGRATE
>>>
>>> #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
>>>
>>> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>>> index 85bc9be..09f84d2 100644
>>> --- a/kernel/cgroup.c
>>> +++ b/kernel/cgroup.c
>>> @@ -2856,7 +2856,8 @@ static int cgroup_procs_write_permission(struct task_struct *task,
>>> */
>>> if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
>>> !uid_eq(cred->euid, tcred->uid) &&
>>> - !uid_eq(cred->euid, tcred->suid))
>>> + !uid_eq(cred->euid, tcred->suid) &&
>>> + !ns_capable(tcred->user_ns, CAP_CGROUP_MIGRATE))
>>> ret = -EACCES;
>>
>> This logic seems rather confused to me. Without this patch, a user
>> can write to procs if it's root *or* it matches the target uid *or* it
>> matches the target suid. How does this make sense? How about
>> ptrace_may_access(...) || ns_capable(tcred->user_ns,
>> CAP_CGROUP_MIGRATE)?
>
> Though ptrace_may_access would open it also to apps with
> CAP_SYS_PTRACE as well, no?
>
> Would pulling out from __ptrace_may_access the:
> if (uid_eq(caller_uid, tcred->euid) &&
> uid_eq(caller_uid, tcred->suid) &&
> uid_eq(caller_uid, tcred->uid) &&
> gid_eq(caller_gid, tcred->egid) &&
> gid_eq(caller_gid, tcred->sgid) &&
> gid_eq(caller_gid, tcred->gid))
> goto ok;
>
> check and creating a new helper that could be shared between them be
> the right approach?
So, is creating a new capability here necessarily the right approach?
Is this operation so unique, or is there an existing silo (not
CAP_SYS_ADMIN) that we can re-use? I ask, because we currently use 38
silos out of a possible 64 capabilities, and when everyone chooses
single-use capabilities, we will quickly exhaust the silos.
I'm not saying that creating a new capability here is wrong, but it is
worth further considering the existing silos to see if there is one
that is a suitable match.
Looking at http://man7.org/linux/man-pages/man7/capabilities.7.html
throws up the following possibilities:
CAP_SYS_NICE
CAP_SYS_PTRACE
CAP_SYS_RESOURCE
I'm aware that you said above that use CAP_SYS_NICE overloads that
capability a bit too much. Maybe it's true, but on the other hand, by
my count from dome rough grepping of the kernel source, there are a
total of 14 capable() checks for CAP_SYS_NICE, out of a total of
around 1256 capable() checks altogether. So, I think this does need to
be balanced against the limited number of silos.
Also, CAP_SYS_RESOURCE deserves consideration (34 uses in capable()
checks). I'd say, since cgroups are about resources, so there's
something of a match there., so it's also worth considering.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
Powered by blists - more mailing lists