linux-kernel - Resource limits interface proposal [was: pull request for writable limits]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 05 May 2010 14:12:54 +0200
From:	Jiri Slaby <jirislaby@...il.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	Alexey Dobriyan <adobriyan@...il.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Neil Horman <nhorman@...driver.com>,
	Oleg Nesterov <oleg@...hat.com>
Subject: Resource limits interface proposal [was: pull request for writable
 limits]

Hi.

On 03/21/2010 07:38 PM, Linus Torvalds wrote:
> Or even just _one_ system call that takes two pointers, and can do an 
> atomic replace-and-return-the-old-value, like 'sigaction()' does, ie 
> something like
> 
> 	int prlimit64(pid, limit, const struct rlimit64 *new, struct rlimit64 *old);
> 
> wouldn't that be a nice generic interface?

So I ended up with thinking about these possibilities:

1) internal representation of limits will stay as is in signal_struct,
i.e. long limits with infinity being ~0ul. This is the least intrusive
solution. The new prlimit64 will convert rlimit64 to rlimit and pass
down to do_prlimit. With setrlimit and getrilimit just as wrappers it
will look like:
prlimit64(pid, resource, new64, old64) ->
    new = convert_to_rlim(new64)
    tsk = find_task(pid)
    do_prlimit(tsk, resource, new, old)
    old64 = convert_to_rlim64(old)
setrlimit(resource, rlim) ->
    do_prlimit(current, resource, rlim, NULL)
getrlimit(resource, rlim) ->
    do_prlimit(current, resource, NULL, rlim)
with appropriate copy_{from,to}_user. (And setrlimit+getrlimit will be
scheduled for removal with all the compat crap around them.)

It may also be that rlimit64 will contain flags like:
#define RLIM64_CUR_INFINITY     0x00000001
#define RLIM64_MAX_INFINITY     0x00000002
struct rlimit64 {
        __u64 rlim_cur;
        __u64 rlim_max;
        __u32 flags;
};
if I understood Alexey correctly to separate limits values from
infinity? flags then will be converted to ~0ul when converting from
rlimit64 to rlimit above too.

The drawback is when a 32-bit user passes down a value >= (1 << 32),
EINVAL shall occur.

The pros are, no locking, no magic, longs are naturally atomic. Still
with arch-independent parameter for sys_prlimit64.

2) Introduce an rlimit lock and move every user to the rlimit helpers
which appropriately lock the accesses. And making locking a nop when
BITS_PER_LONG == 64. Then we can have rlimit64 in signal_struct and
everything will happen on 64-bit limit values.

If we decide to separate infinity from value with the flags above, we
should also reconsider what infinity will be. Much code just counts with
rlimit.rlim_{cur,max} being the highest possible value and doesn't count
with something like rlimit64.flags. This will result in locks not-being
a nop on 64-bit, because we want fresh rlim_cur+flags and rlim_max+flags
pairs. We could also have the flags solely in the syscall interface and
~0ULL count as infty internally.

In this case the situation will be
prlimit64(pid, resource, new64, old64) ->
    tsk = find_task(pid)
    do_prlimit(tsk, resource, new64, old64)
setrlimit(resource, rlim) ->
    rlim64 = convert_to_rlim64(rlim)
    do_prlimit(current, resource, rlim64, NULL)
getrlimit(resource, rlim) ->
    do_prlimit(current, resource, NULL, rlim64)
    rlim = convert_to_rlim(rlim64)

We cannot fail in prlimit64 due to limited space in longs on 32-bit,
however we added locking which may slow things down. I have no idea how
contended the lock will be, but as rlimits are used in the scheduler and
filesystem core, it might affect performance. I might measure if this is
of interest.

3) [inspired by Jan Kara's idea who knows how inode handling works] It's
some kind of similar to 2), we just avoid locks similarly to
inode->i_size accessors.

It doesn't solve the case of separate flags though.

Just a side note, we cannot use the rlimit64 name which is already
reserved in glibc headers for limits handling.

I will appreciate any comments.

thanks,
-- 
js

-- 
js
suse labs

-- 
js
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/