[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54CADDA4.4040602@schaufler-ca.com>
Date: Thu, 29 Jan 2015 17:25:56 -0800
From: Casey Schaufler <casey@...aufler-ca.com>
To: paulmck@...ux.vnet.ibm.com
CC: Iulia Manda <iulia.manda21@...il.com>, gnomes@...rguk.ukuu.org.uk,
serge.hallyn@...onical.com, linux-kernel@...r.kernel.org,
akpm@...ux-foundation.org, josh@...htriplett.org,
peterz@...radead.org, mhocko@...e.cz,
LSM <linux-security-module@...r.kernel.org>,
Casey Schaufler <casey@...aufler-ca.com>
Subject: Re: [PATCH v2] kernel: Conditionally support non-root users, groups
and capabilities
On 1/29/2015 4:32 PM, Paul E. McKenney wrote:
> On Thu, Jan 29, 2015 at 03:44:46PM -0800, Casey Schaufler wrote:
>> On 1/29/2015 10:43 AM, Iulia Manda wrote:
>>> There are a lot of embedded systems that run most or all of their functionality
>>> in init, running as root:root. For these systems, supporting multiple users is
>>> not necessary.
>>>
>>> This patch adds a new symbol, CONFIG_NON_ROOT, that makes support for non-root
>>> users, non-root groups, and capabilities optional.
>>>
>>> When this symbol is not defined, UID and GID are zero in any possible case
>>> and processes always have all capabilities.
>>>
>>> The following syscalls are compiled out: setuid, setregid, setgid,
>>> setreuid, setresuid, getresuid, setresgid, getresgid, setgroups, getgroups,
>>> setfsuid, setfsgid, capget, capset.
>>>
>>> Also, groups.c is compiled out completely.
>>>
>>> This change saves about 25 KB on a defconfig build.
>>>
>>> The kernel was booted in Qemu. All the common functionalities work. Adding
>>> users/groups is not possible, failing with -ENOSYS.
>>>
>>> Bloat-o-meter output:
>>> add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650)
>>>
>>> Signed-off-by: Iulia Manda <iulia.manda21@...il.com>
>>> Reviewed-by: Josh Triplett <josh@...htriplett.org>
>> v2 does nothing to address the longstanding position of
>> the community that disabling the traditional user based
>> access controls is unacceptable.
>>
>> If the community has abandoned that position, and I see no
>> reason to believe that is true, the correct implementation
>> is to rework the LSM from an additional controls model to
>> an authoritative hook model.
>>
>> Speaking of the LSM, what is your expectation regarding the
>> use of security modules in addition to "NON_ROOT"? Is it
>> forbidden, allowed or encouraged?
> I am guessing that people who remove uids and gids from their
> kernels would tend not to add LSM. From what I understand, these
> kernels are designed for special-purpose applications that have
> very limited and stylized interactions with the outside world.
> Applications that, back in the day, would have been written to
> run on bare metal without any OS whatsoever.
Linux is still going to be too big for those applications. Taking
the UID, GID and capability processing out is, at 25k, hardly significant.
Yes, you'll save some processing time, but the benchmarks I've run in the
dim dark past indicated that the impact is actually trivial. I would of
course invite the advocates of this patch to produce numbers. No, if you
are looking to switch from a RTOS to a Linux kernel, UID processing isn't
going to be your first (second, or third) concern.
As for LSMs, I can easily see putting in the security model from the old
RTOS on top of a NON_ROOT configuration. Won't that be fun when the CVEs
start to fly?
Do you think you'll be running system services like systemd on top of this?
Anyone *else* remember what happened when they put capability handling into
sendmail?
>
>> Hacking security code out with ifdefs is a common enough
>> practice, but I like to think the kernel community knows
>> better.
> >From what I understand, the alternative in this case is for the
> applications to use some other "OS" that lacks security from the get-go,
> so one can argue that NON_ROOT or MULTIUSER or whatever isn't resulting
> in a net decrease in security.
>
> Thanx, Paul
>
>>> ---
>>> Changes since v1:
>>> - refactor code;
>>> - compile out groups.c;
>>> - if groups_alloc is called, enable NON_ROOT;
>>>
>>> arch/s390/Kconfig | 1 +
>>> drivers/staging/lustre/lustre/Kconfig | 1 +
>>> fs/nfsd/Kconfig | 1 +
>>> include/linux/capability.h | 29 +++++++++++++++++++++++++++
>>> include/linux/cred.h | 23 ++++++++++++++++++----
>>> include/linux/uidgid.h | 12 +++++++++++
>>> init/Kconfig | 19 +++++++++++++++++-
>>> kernel/Makefile | 4 +++-
>>> kernel/capability.c | 35 ++++++++++++++++++---------------
>>> kernel/cred.c | 3 +++
>>> kernel/groups.c | 3 ---
>>> kernel/sys.c | 2 ++
>>> kernel/sys_ni.c | 14 +++++++++++++
>>> net/sunrpc/Kconfig | 2 ++
>>> 14 files changed, 124 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
>>> index 68b68d7..b2d2116 100644
>>> --- a/arch/s390/Kconfig
>>> +++ b/arch/s390/Kconfig
>>> @@ -324,6 +324,7 @@ config COMPAT
>>> select COMPAT_BINFMT_ELF if BINFMT_ELF
>>> select ARCH_WANT_OLD_COMPAT_IPC
>>> select COMPAT_OLD_SIGACTION
>>> + select NON_ROOT
>>> help
>>> Select this option if you want to enable your system kernel to
>>> handle system-calls from ELF binaries for 31 bit ESA. This option
>>> diff --git a/drivers/staging/lustre/lustre/Kconfig b/drivers/staging/lustre/lustre/Kconfig
>>> index 6725467..b975f62 100644
>>> --- a/drivers/staging/lustre/lustre/Kconfig
>>> +++ b/drivers/staging/lustre/lustre/Kconfig
>>> @@ -10,6 +10,7 @@ config LUSTRE_FS
>>> select CRYPTO_SHA1
>>> select CRYPTO_SHA256
>>> select CRYPTO_SHA512
>>> + select NON_ROOT
>>> help
>>> This option enables Lustre file system client support. Choose Y
>>> here if you want to access a Lustre file system cluster. To compile
>>> diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
>>> index 7339515..1a8d6d9 100644
>>> --- a/fs/nfsd/Kconfig
>>> +++ b/fs/nfsd/Kconfig
>>> @@ -6,6 +6,7 @@ config NFSD
>>> select SUNRPC
>>> select EXPORTFS
>>> select NFS_ACL_SUPPORT if NFSD_V2_ACL
>>> + select NON_ROOT
>>> help
>>> Choose Y here if you want to allow other computers to access
>>> files residing on this system using Sun's Network File System
>>> diff --git a/include/linux/capability.h b/include/linux/capability.h
>>> index aa93e5e..601c5de 100644
>>> --- a/include/linux/capability.h
>>> +++ b/include/linux/capability.h
>>> @@ -205,6 +205,7 @@ static inline kernel_cap_t cap_raise_nfsd_set(const kernel_cap_t a,
>>> cap_intersect(permitted, __cap_nfsd_set));
>>> }
>>>
>>> +#ifdef CONFIG_NON_ROOT
>>> extern bool has_capability(struct task_struct *t, int cap);
>>> extern bool has_ns_capability(struct task_struct *t,
>>> struct user_namespace *ns, int cap);
>>> @@ -213,6 +214,34 @@ extern bool has_ns_capability_noaudit(struct task_struct *t,
>>> struct user_namespace *ns, int cap);
>>> extern bool capable(int cap);
>>> extern bool ns_capable(struct user_namespace *ns, int cap);
>>> +#else
>>> +static inline bool has_capability(struct task_struct *t, int cap)
>>> +{
>>> + return true;
>>> +}
>>> +static inline bool has_ns_capability(struct task_struct *t,
>>> + struct user_namespace *ns, int cap)
>>> +{
>>> + return true;
>>> +}
>>> +static inline bool has_capability_noaudit(struct task_struct *t, int cap)
>>> +{
>>> + return true;
>>> +}
>>> +static inline bool has_ns_capability_noaudit(struct task_struct *t,
>>> + struct user_namespace *ns, int cap)
>>> +{
>>> + return true;
>>> +}
>>> +static inline bool capable(int cap)
>>> +{
>>> + return true;
>>> +}
>>> +static inline bool ns_capable(struct user_namespace *ns, int cap)
>>> +{
>>> + return true;
>>> +}
>>> +#endif /* CONFIG_NON_ROOT */
>>> extern bool capable_wrt_inode_uidgid(const struct inode *inode, int cap);
>>> extern bool file_ns_capable(const struct file *file, struct user_namespace *ns, int cap);
>>>
>>> diff --git a/include/linux/cred.h b/include/linux/cred.h
>>> index 2fb2ca2..08ea5c6 100644
>>> --- a/include/linux/cred.h
>>> +++ b/include/linux/cred.h
>>> @@ -62,9 +62,27 @@ do { \
>>> groups_free(group_info); \
>>> } while (0)
>>>
>>> -extern struct group_info *groups_alloc(int);
>>> extern struct group_info init_groups;
>>> +#ifdef CONFIG_NON_ROOT
>>> +extern struct group_info *groups_alloc(int);
>>> extern void groups_free(struct group_info *);
>>> +
>>> +extern int in_group_p(kgid_t);
>>> +extern int in_egroup_p(kgid_t);
>>> +#else
>>> +static inline void groups_free(struct group_info *group_info)
>>> +{
>>> +}
>>> +
>>> +static inline int in_group_p(kgid_t grp)
>>> +{
>>> + return 1;
>>> +}
>>> +static inline int in_egroup_p(kgid_t grp)
>>> +{
>>> + return 1;
>>> +}
>>> +#endif
>>> extern int set_current_groups(struct group_info *);
>>> extern void set_groups(struct cred *, struct group_info *);
>>> extern int groups_search(const struct group_info *, kgid_t);
>>> @@ -74,9 +92,6 @@ extern bool may_setgroups(void);
>>> #define GROUP_AT(gi, i) \
>>> ((gi)->blocks[(i) / NGROUPS_PER_BLOCK][(i) % NGROUPS_PER_BLOCK])
>>>
>>> -extern int in_group_p(kgid_t);
>>> -extern int in_egroup_p(kgid_t);
>>> -
>>> /*
>>> * The security context of a task
>>> *
>>> diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
>>> index 2d1f9b6..22bd1fa 100644
>>> --- a/include/linux/uidgid.h
>>> +++ b/include/linux/uidgid.h
>>> @@ -29,6 +29,7 @@ typedef struct {
>>> #define KUIDT_INIT(value) (kuid_t){ value }
>>> #define KGIDT_INIT(value) (kgid_t){ value }
>>>
>>> +#ifdef CONFIG_NON_ROOT
>>> static inline uid_t __kuid_val(kuid_t uid)
>>> {
>>> return uid.val;
>>> @@ -38,6 +39,17 @@ static inline gid_t __kgid_val(kgid_t gid)
>>> {
>>> return gid.val;
>>> }
>>> +#else
>>> +static inline uid_t __kuid_val(kuid_t uid)
>>> +{
>>> + return 0;
>>> +}
>>> +
>>> +static inline gid_t __kgid_val(kgid_t gid)
>>> +{
>>> + return 0;
>>> +}
>>> +#endif
>>>
>>> #define GLOBAL_ROOT_UID KUIDT_INIT(0)
>>> #define GLOBAL_ROOT_GID KGIDT_INIT(0)
>>> diff --git a/init/Kconfig b/init/Kconfig
>>> index 9afb971..dc5bfd4 100644
>>> --- a/init/Kconfig
>>> +++ b/init/Kconfig
>>> @@ -394,6 +394,7 @@ endchoice
>>>
>>> config BSD_PROCESS_ACCT
>>> bool "BSD Process Accounting"
>>> + select NON_ROOT
>>> help
>>> If you say Y here, a user level program will be able to instruct the
>>> kernel (via a special system call) to write process accounting
>>> @@ -420,6 +421,7 @@ config BSD_PROCESS_ACCT_V3
>>> config TASKSTATS
>>> bool "Export task/process statistics through netlink"
>>> depends on NET
>>> + select NON_ROOT
>>> default n
>>> help
>>> Export selected statistics for tasks/processes through the
>>> @@ -1140,6 +1142,7 @@ config CHECKPOINT_RESTORE
>>>
>>> menuconfig NAMESPACES
>>> bool "Namespaces support" if EXPERT
>>> + depends on NON_ROOT
>>> default !EXPERT
>>> help
>>> Provides the way to make tasks work with different objects using
>>> @@ -1352,11 +1355,25 @@ menuconfig EXPERT
>>>
>>> config UID16
>>> bool "Enable 16-bit UID system calls" if EXPERT
>>> - depends on HAVE_UID16
>>> + depends on HAVE_UID16 && NON_ROOT
>>> default y
>>> help
>>> This enables the legacy 16-bit UID syscall wrappers.
>>>
>>> +config NON_ROOT
>>> + bool "Multiple users, groups and capabilities support" if EXPERT
>>> + default y
>>> + help
>>> + This option enables support for non-root users, groups and
>>> + capabilities.
>>> +
>>> + If you say N here, all processes will run with UID 0, GID 0, and all
>>> + possible capabilities. Saying N here also compiles out support for
>>> + system calls related to UIDs, GIDs, and capabilities, such as setuid,
>>> + setgid, and capset.
>>> +
>>> + If unsure, say Y here.
>>> +
>>> config SGETMASK_SYSCALL
>>> bool "sgetmask/ssetmask syscalls support" if EXPERT
>>> def_bool PARISC || MN10300 || BLACKFIN || M68K || PPC || MIPS || X86 || SPARC || CRIS || MICROBLAZE || SUPERH
>>> diff --git a/kernel/Makefile b/kernel/Makefile
>>> index a59481a..d5ca6b8 100644
>>> --- a/kernel/Makefile
>>> +++ b/kernel/Makefile
>>> @@ -9,7 +9,9 @@ obj-y = fork.o exec_domain.o panic.o \
>>> extable.o params.o \
>>> kthread.o sys_ni.o nsproxy.o \
>>> notifier.o ksysfs.o cred.o reboot.o \
>>> - async.o range.o groups.o smpboot.o
>>> + async.o range.o smpboot.o
>>> +
>>> +obj-$(CONFIG_NON_ROOT) += groups.o
>>>
>>> ifdef CONFIG_FUNCTION_TRACER
>>> # Do not trace debug files and internal ftrace files
>>> diff --git a/kernel/capability.c b/kernel/capability.c
>>> index 989f5bf..2638412 100644
>>> --- a/kernel/capability.c
>>> +++ b/kernel/capability.c
>>> @@ -35,6 +35,7 @@ static int __init file_caps_disable(char *str)
>>> }
>>> __setup("no_file_caps", file_caps_disable);
>>>
>>> +#ifdef CONFIG_NON_ROOT
>>> /*
>>> * More recent versions of libcap are available from:
>>> *
>>> @@ -386,6 +387,24 @@ bool ns_capable(struct user_namespace *ns, int cap)
>>> }
>>> EXPORT_SYMBOL(ns_capable);
>>>
>>> +
>>> +/**
>>> + * capable - Determine if the current task has a superior capability in effect
>>> + * @cap: The capability to be tested for
>>> + *
>>> + * Return true if the current task has the given superior capability currently
>>> + * available for use, false if not.
>>> + *
>>> + * This sets PF_SUPERPRIV on the task if the capability is available on the
>>> + * assumption that it's about to be used.
>>> + */
>>> +bool capable(int cap)
>>> +{
>>> + return ns_capable(&init_user_ns, cap);
>>> +}
>>> +EXPORT_SYMBOL(capable);
>>> +#endif /* CONFIG_NON_ROOT */
>>> +
>>> /**
>>> * file_ns_capable - Determine if the file's opener had a capability in effect
>>> * @file: The file we want to check
>>> @@ -412,22 +431,6 @@ bool file_ns_capable(const struct file *file, struct user_namespace *ns,
>>> EXPORT_SYMBOL(file_ns_capable);
>>>
>>> /**
>>> - * capable - Determine if the current task has a superior capability in effect
>>> - * @cap: The capability to be tested for
>>> - *
>>> - * Return true if the current task has the given superior capability currently
>>> - * available for use, false if not.
>>> - *
>>> - * This sets PF_SUPERPRIV on the task if the capability is available on the
>>> - * assumption that it's about to be used.
>>> - */
>>> -bool capable(int cap)
>>> -{
>>> - return ns_capable(&init_user_ns, cap);
>>> -}
>>> -EXPORT_SYMBOL(capable);
>>> -
>>> -/**
>>> * capable_wrt_inode_uidgid - Check nsown_capable and uid and gid mapped
>>> * @inode: The inode in question
>>> * @cap: The capability in question
>>> diff --git a/kernel/cred.c b/kernel/cred.c
>>> index e0573a4..ec1c076 100644
>>> --- a/kernel/cred.c
>>> +++ b/kernel/cred.c
>>> @@ -29,6 +29,9 @@
>>>
>>> static struct kmem_cache *cred_jar;
>>>
>>> +/* init to 2 - one for init_task, one to ensure it is never freed */
>>> +struct group_info init_groups = { .usage = ATOMIC_INIT(2) };
>>> +
>>> /*
>>> * The initial credentials for the initial task
>>> */
>>> diff --git a/kernel/groups.c b/kernel/groups.c
>>> index 664411f..74d431d 100644
>>> --- a/kernel/groups.c
>>> +++ b/kernel/groups.c
>>> @@ -9,9 +9,6 @@
>>> #include <linux/user_namespace.h>
>>> #include <asm/uaccess.h>
>>>
>>> -/* init to 2 - one for init_task, one to ensure it is never freed */
>>> -struct group_info init_groups = { .usage = ATOMIC_INIT(2) };
>>> -
>>> struct group_info *groups_alloc(int gidsetsize)
>>> {
>>> struct group_info *group_info;
>>> diff --git a/kernel/sys.c b/kernel/sys.c
>>> index a8c9f5a..bfe532b 100644
>>> --- a/kernel/sys.c
>>> +++ b/kernel/sys.c
>>> @@ -319,6 +319,7 @@ out_unlock:
>>> * SMP: There are not races, the GIDs are checked only by filesystem
>>> * operations (as far as semantic preservation is concerned).
>>> */
>>> +#ifdef CONFIG_NON_ROOT
>>> SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
>>> {
>>> struct user_namespace *ns = current_user_ns();
>>> @@ -809,6 +810,7 @@ change_okay:
>>> commit_creds(new);
>>> return old_fsgid;
>>> }
>>> +#endif /* CONFIG_NON_ROOT */
>>>
>>> /**
>>> * sys_getpid - return the thread group id of the current process
>>> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
>>> index 5adcb0a..7995ef5 100644
>>> --- a/kernel/sys_ni.c
>>> +++ b/kernel/sys_ni.c
>>> @@ -159,6 +159,20 @@ cond_syscall(sys_uselib);
>>> cond_syscall(sys_fadvise64);
>>> cond_syscall(sys_fadvise64_64);
>>> cond_syscall(sys_madvise);
>>> +cond_syscall(sys_setuid);
>>> +cond_syscall(sys_setregid);
>>> +cond_syscall(sys_setgid);
>>> +cond_syscall(sys_setreuid);
>>> +cond_syscall(sys_setresuid);
>>> +cond_syscall(sys_getresuid);
>>> +cond_syscall(sys_setresgid);
>>> +cond_syscall(sys_getresgid);
>>> +cond_syscall(sys_setgroups);
>>> +cond_syscall(sys_getgroups);
>>> +cond_syscall(sys_setfsuid);
>>> +cond_syscall(sys_setfsgid);
>>> +cond_syscall(sys_capget);
>>> +cond_syscall(sys_capset);
>>>
>>> /* arch-specific weak syscall entries */
>>> cond_syscall(sys_pciconfig_read);
>>> diff --git a/net/sunrpc/Kconfig b/net/sunrpc/Kconfig
>>> index fb78117..2b2c471 100644
>>> --- a/net/sunrpc/Kconfig
>>> +++ b/net/sunrpc/Kconfig
>>> @@ -1,9 +1,11 @@
>>> config SUNRPC
>>> tristate
>>> + select NON_ROOT
>>>
>>> config SUNRPC_GSS
>>> tristate
>>> select OID_REGISTRY
>>> + select NON_ROOT
>>>
>>> config SUNRPC_BACKCHANNEL
>>> bool
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists