[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100830194806.GC25870@openwall.com>
Date: Mon, 30 Aug 2010 23:48:06 +0400
From: Solar Designer <solar@...nwall.com>
To: Roland McGrath <roland@...hat.com>
Cc: Kees Cook <kees.cook@...onical.com>, linux-kernel@...r.kernel.org,
oss-security@...ts.openwall.com, Al Viro <viro@...iv.linux.org.uk>,
Andrew Morton <akpm@...ux-foundation.org>,
Oleg Nesterov <oleg@...hat.com>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Neil Horman <nhorman@...driver.com>,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] exec argument expansion can inappropriately trigger OOM-killer
On Mon, Aug 30, 2010 at 03:06:16AM -0700, Roland McGrath wrote:
> And I say, if your userland process could really allocate another 200GB,
> then more power to you, you can do it with an exec too. If you could do
> the same with a userland stack allocation, and spend all that time in
> strlen calls and then memcpy, you can do it inside execve too. If it
> takes days, that's what you asked for, and it's your process. It just
> ought to be every bit (or near enough) as preemptible and interruptible
> as that normal userland activity ought to be.
This makes sense to me. However, introducing a new preemption point
may violate assumptions under which the code was written and reviewed
in the past. In the worst case, we'd introduce/expose race conditions
allowing for privilege escalation.
> So, perhaps we want this (count already has a cond_resched in its loop):
Good point re: count() already having this (I think it did not in 2.2).
> @@ -400,6 +403,10 @@ static int copy_strings(int argc, const
> int len;
> unsigned long pos;
>
> + if (signal_pending(current))
> + return -ERESTARTNOINTR;
> + cond_resched();
So, in current kernels, you're making it possible for more kinds of
things to change after prepare_binprm() but before
search_binary_handler(). We'd need to check for possible implications
of this.
I must admit I am not familiar with what additional kinds of things may
change when execution is preempted. This made a significant difference
in some much older kernels (many years ago), but now that the kernel
makes a lot less use of locking most things may be changed by another
CPU even without preemption. So does anyone have a list of what
additional risks we're exposed to, if any, when we allow preemption in
current kernels?
> Has someone reported this BUG_ON failure mode with a reproducer?
64bit_dos.c was supposed to be the reproducer, and I managed to get it
to work (as I've documented in another message earlier today). The
prerequisites appeared to be (some of these might be specific to my
tests, though):
- 64-bit kernel with 32-bit userland support (e.g., CONFIG_IA32_EMULATION);
- 64-bit build of 64bit_dos.c;
- 32-bit build of the target program;
- no dynamic linking in the target program;
- "ulimit -s unlimited" before running the reproducer program;
- over 3 GB of RAM in the system.
> [...] Rather than better enabling OOM killing, I think what really
> makes sense is for the nascent mm to be marked such that allocations in
> it (they'll be from get_arg_page->get_user_pages->handle_mm_fault) just
> fail with ENOMEM before it resorts to the OOM killer (or perhaps even to
> very aggressive pageout). That should percolate back to the execve just
> failing with ENOMEM, which is nicer than OOM kill even if the OOM killer
> actually does pick exactly and only the right target.
I agree.
Thanks,
Alexander
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists