linux-kernel - Re: [PATCH 0/3] extend get/setrlimit to support setting rlimits external to a process (v7)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20091104112632.GA9243@elte.hu>
Date:	Wed, 4 Nov 2009 12:26:32 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Neil Horman <nhorman@...driver.com>
Cc:	linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
	marcin.slusarz@...il.com, tglx@...utronix.de, mingo@...hat.com,
	hpa@...or.com, Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 0/3] extend get/setrlimit to support setting rlimits
	external to a process (v7)

* Neil Horman <nhorman@...driver.com> wrote:

> On Mon, Nov 02, 2009 at 07:51:37PM +0100, Ingo Molnar wrote:
> > 
> > * Neil Horman <nhorman@...driver.com> wrote:
> > 
> > > > Have you ensured that no rlimit gets propagated during task init 
> > > > into some other value - under the previously correct assumption that 
> > > > rlimits dont change asynchronously under the feet of tasks?
> > > 
> > > I've looked, and the only place that I see the rlim array getting 
> > > copied is via copy_signal when we're in the clone path.  The 
> > > entire rlim array is copied from old task_struct to new 
> > > task_struct under the protection of the current->group_leader task 
> > > lock, which I also hold when updating via sys_setprlimit, so I 
> > > think we're safe in this case.
> > 
> > I mean - do we set up any data structure based on a particular 
> > rlimit, that can get out of sync with the rlimit being updated?
> > 
> > A prominent example would be the stack limit - we base address 
> > layout decisions on it. Check arch/x86/mm/mmap.c. RLIM_INFINITY has 
> > a special meaning plus we also set mmap_base() based on the rlim.
> 
> Ah, I didn't consider those.  Yes it looks like some locking might be 
> needed for cases like that.  what would you suggest, simply grabbing 
> the task lock before looking at the rlim array?  That seems a bit 
> heavy handed, especially if we want to use the locking consistently.  
> What if we just converted the int array of rlimit to atomic_t's?  
> Would that be sufficient, or still to heavy?

The main problem isnt even atomicity (word sized, naturally aligned 
variables are read/written atomic already), but logical coherency and 
races: how robust is it to change the rlimit 'under' a task that is 
running those VM routines on another CPU right now? How robust is it to 
change a task from RLIM_INFINITY and affecting fundamental properties of 
its layout?

The answer might easily be: "it causes no security problems and we dont 
care about self-inflicted damage" - but we have to consider each usage 
site individually and list them in the changelog i suspect.

I checked some other rlimit uses (the VFS ones) and most of them seemed 
to be fine, at first glance.

What we do here is to introduce a completely new mode of access to an 
ancient and quite fundamental data structure of the kernel, so i think 
all the usage sites and side-effects should be thought through.

I wouldnt go so far to suggest explicit, heavy-handed locking - _most_ 
of the uses are single-use. I just wanted to point out the possibilities 
that should be considered before we can have warm fuzzy feelings about 
your patch.

Maybe a read wrapper that does an ACCESS_ONCE() would be prudent, in 
case compilers do something silly in the future.

> > Also, there appears to be almost no security checks in the new 
> > syscall! We look up a PID but that's it - this code will allow 
> > unprivileged users to lower various rlimits of system daemons - as 
> > if it were their own limit. That's a rather big security hole.
> 
> Yeah, I kept all the old checks in place, but didn't consider that 
> other processes might need additional security checks, I guess the 
> rule needs to be that the callers uid needs to have CAP_SYS_RESOURCE 
> and must match the uid of the process being modified or be 0/root.  Is 
> that about right?

I think the regular ptrace or signal security checks could be reused 
(sans the legacy components).

Those tend to be a (tiny) bit more than just a uid+capability check - 
they are a [fse]uid check, i.e. the path of denial should be something 
like:

        if ((cred->uid != tcred->euid ||
             cred->uid != tcred->suid ||
             cred->uid != tcred->uid  ||
             cred->gid != tcred->egid ||
             cred->gid != tcred->sgid ||
             cred->gid != tcred->gid) &&
            !capable(CAP_SYS_RESOURCE)) {

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/