[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091104112632.GA9243@elte.hu>
Date: Wed, 4 Nov 2009 12:26:32 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Neil Horman <nhorman@...driver.com>
Cc: linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
marcin.slusarz@...il.com, tglx@...utronix.de, mingo@...hat.com,
hpa@...or.com, Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 0/3] extend get/setrlimit to support setting rlimits
external to a process (v7)
* Neil Horman <nhorman@...driver.com> wrote:
> On Mon, Nov 02, 2009 at 07:51:37PM +0100, Ingo Molnar wrote:
> >
> > * Neil Horman <nhorman@...driver.com> wrote:
> >
> > > > Have you ensured that no rlimit gets propagated during task init
> > > > into some other value - under the previously correct assumption that
> > > > rlimits dont change asynchronously under the feet of tasks?
> > >
> > > I've looked, and the only place that I see the rlim array getting
> > > copied is via copy_signal when we're in the clone path. The
> > > entire rlim array is copied from old task_struct to new
> > > task_struct under the protection of the current->group_leader task
> > > lock, which I also hold when updating via sys_setprlimit, so I
> > > think we're safe in this case.
> >
> > I mean - do we set up any data structure based on a particular
> > rlimit, that can get out of sync with the rlimit being updated?
> >
> > A prominent example would be the stack limit - we base address
> > layout decisions on it. Check arch/x86/mm/mmap.c. RLIM_INFINITY has
> > a special meaning plus we also set mmap_base() based on the rlim.
>
> Ah, I didn't consider those. Yes it looks like some locking might be
> needed for cases like that. what would you suggest, simply grabbing
> the task lock before looking at the rlim array? That seems a bit
> heavy handed, especially if we want to use the locking consistently.
> What if we just converted the int array of rlimit to atomic_t's?
> Would that be sufficient, or still to heavy?
The main problem isnt even atomicity (word sized, naturally aligned
variables are read/written atomic already), but logical coherency and
races: how robust is it to change the rlimit 'under' a task that is
running those VM routines on another CPU right now? How robust is it to
change a task from RLIM_INFINITY and affecting fundamental properties of
its layout?
The answer might easily be: "it causes no security problems and we dont
care about self-inflicted damage" - but we have to consider each usage
site individually and list them in the changelog i suspect.
I checked some other rlimit uses (the VFS ones) and most of them seemed
to be fine, at first glance.
What we do here is to introduce a completely new mode of access to an
ancient and quite fundamental data structure of the kernel, so i think
all the usage sites and side-effects should be thought through.
I wouldnt go so far to suggest explicit, heavy-handed locking - _most_
of the uses are single-use. I just wanted to point out the possibilities
that should be considered before we can have warm fuzzy feelings about
your patch.
Maybe a read wrapper that does an ACCESS_ONCE() would be prudent, in
case compilers do something silly in the future.
> > Also, there appears to be almost no security checks in the new
> > syscall! We look up a PID but that's it - this code will allow
> > unprivileged users to lower various rlimits of system daemons - as
> > if it were their own limit. That's a rather big security hole.
>
> Yeah, I kept all the old checks in place, but didn't consider that
> other processes might need additional security checks, I guess the
> rule needs to be that the callers uid needs to have CAP_SYS_RESOURCE
> and must match the uid of the process being modified or be 0/root. Is
> that about right?
I think the regular ptrace or signal security checks could be reused
(sans the legacy components).
Those tend to be a (tiny) bit more than just a uid+capability check -
they are a [fse]uid check, i.e. the path of denial should be something
like:
if ((cred->uid != tcred->euid ||
cred->uid != tcred->suid ||
cred->uid != tcred->uid ||
cred->gid != tcred->egid ||
cred->gid != tcred->sgid ||
cred->gid != tcred->gid) &&
!capable(CAP_SYS_RESOURCE)) {
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists