[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7b0714e26c7c2216721641d7df16a49687927e37.camel@intel.com>
Date: Fri, 12 Oct 2018 17:04:46 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "jannh@...gle.com" <jannh@...gle.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"daniel@...earbox.net" <daniel@...earbox.net>,
"keescook@...omium.org" <keescook@...omium.org>,
"jeyu@...nel.org" <jeyu@...nel.org>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
"arjan@...ux.intel.com" <arjan@...ux.intel.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"linux-mips@...ux-mips.org" <linux-mips@...ux-mips.org>,
"linux-s390@...r.kernel.org" <linux-s390@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>,
"kristen@...ux.intel.com" <kristen@...ux.intel.com>,
"Dock, Deneen T" <deneen.t.dock@...el.com>,
"catalin.marinas@....com" <catalin.marinas@....com>,
"mingo@...hat.com" <mingo@...hat.com>,
"will.deacon@....com" <will.deacon@....com>,
"kernel-hardening@...ts.openwall.com"
<kernel-hardening@...ts.openwall.com>,
"bp@...en8.de" <bp@...en8.de>,
"Hansen, Dave" <dave.hansen@...el.com>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"arnd@...db.de" <arnd@...db.de>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"sparclinux@...r.kernel.org" <sparclinux@...r.kernel.org>
Subject: Re: [PATCH v2 1/7] modules: Create rlimit for module space
On Fri, 2018-10-12 at 02:35 +0200, Jann Horn wrote:
> On Fri, Oct 12, 2018 at 1:40 AM Rick Edgecombe
> <rick.p.edgecombe@...el.com> wrote:
> > This introduces a new rlimit, RLIMIT_MODSPACE, which limits the amount of
> > module space a user can use. The intention is to be able to limit module
> > space
> > allocations that may come from un-privlidged users inserting e/BPF filters.
>
> Note that in some configurations (iirc e.g. the default Ubuntu
> config), normal users can use the subuid mechanism (the /etc/subuid
> config file and the /usr/bin/newuidmap setuid helper) to gain access
> to 65536 UIDs, which means that in such a configuration,
> RLIMIT_MODSPACE*65537 is the actual limit for one user. (Same thing
> applies to RLIMIT_MEMLOCK.)
Ah, that is a problem. There is only room for about 130,000 filters on x86 with
KASLR, so it couldn't really be set small enough.
I'll have to look into what this is. Thanks for pointing it out.
> Also, it is probably possible to waste a few times as much virtual
> memory as permitted by the limit by deliberately fragmenting virtual
> memory?
Good point. I guess if the first point can be addressed somehow, this one could
maybe be solved by just picking a lower limit.
Any thoughts on if instead of all this there was just a system wide limit on BPF
JIT module space usage?
> > There is unfortunately no cross platform place to perform this accounting
> > during allocation in the module space, so instead two helpers are created to
> > be
> > inserted into the various arch’s that implement module_alloc. These
> > helpers perform the checks and help with tracking. The intention is that
> > they
> > an be added to the various arch’s as easily as possible.
>
> nit: s/an/can/
>
> [...]
> > diff --git a/kernel/module.c b/kernel/module.c
> > index 6746c85511fe..2ef9ed95bf60 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -2110,9 +2110,139 @@ static void free_module_elf(struct module *mod)
> > }
> > #endif /* CONFIG_LIVEPATCH */it
> >
> > +struct mod_alloc_user {
> > + struct rb_node node;
> > + unsigned long addr;
> > + unsigned long pages;
> > + kuid_t uid;
> > +};
> > +
> > +static struct rb_root alloc_users = RB_ROOT;
> > +static DEFINE_SPINLOCK(alloc_users_lock);
>
> Why all the rbtree stuff instead of stashing a pointer in struct
> vmap_area, or something like that?
Since the tracking was not for all vmalloc usage, the intention was to not bloat
the structure for other usages likes stacks. I thought usually there wouldn't be
nearly as much module space allocations as there would be kernel stacks, but I
didn't do any actual measurements on the tradeoffs.
> [...]
> > +int check_inc_mod_rlimit(unsigned long size)
> > +{
> > + struct user_struct *user = get_current_user();
> > + unsigned long modspace_pages = rlimit(RLIMIT_MODSPACE) >>
> > PAGE_SHIFT;
> > + unsigned long cur_pages = atomic_long_read(&user->module_vm);
> > + unsigned long new_pages = get_mod_page_cnt(size);
> > +
> > + if (rlimit(RLIMIT_MODSPACE) != RLIM_INFINITY
> > + && cur_pages + new_pages > modspace_pages) {
> > + free_uid(user);
> > + return 1;
> > + }
> > +
> > + atomic_long_add(new_pages, &user->module_vm);
> > +
> > + if (atomic_long_read(&user->module_vm) > modspace_pages) {
> > + atomic_long_sub(new_pages, &user->module_vm);
> > + free_uid(user);
> > + return 1;
> > + }
> > +
> > + free_uid(user);
>
> If you drop the reference on the user_struct, an attacker with two
> UIDs can charge module allocations to UID A, keep the associated
> sockets alive as UID B, and then log out and back in again as UID A.
> At that point, nobody is charged for the module space anymore. If you
> look at the eBPF implementation, you'll see that
> bpf_prog_charge_memlock() actually stores a refcounted pointer to the
> user_struct.
Ok, I'll take a look. Thanks Jann.
> > + return 0;
> > +}
Powered by blists - more mailing lists