[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120802065557.GI6481@ZenIV.linux.org.uk>
Date: Thu, 2 Aug 2012 07:55:57 +0100
From: Al Viro <viro@...IV.linux.org.uk>
To: "H. Peter Anvin" <hpa@...or.com>
Cc: Meredydd Luff <meredydd@...atehouse.org>,
linux-kernel@...r.kernel.org, Kees Cook <keescook@...omium.org>,
Ingo Molnar <mingo@...hat.com>, Jeff Dike <jdike@...toit.com>,
Richard Weinberger <richard@....at>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-arch@...r.kernel.org
Subject: Re: [PATCH] [RFC] syscalls,x86: Add execveat() system call (v2)
On Wed, Aug 01, 2012 at 04:30:22PM -0700, H. Peter Anvin wrote:
> On 08/01/2012 04:09 PM, Meredydd Luff wrote:
> >>> #
> >>> # x32-specific system call numbers start at 512 to avoid cache impact
> >>
> >> I think that should be common, not 64 (as should kcmp be).
> >
> > I copied the original execve, which is 64.
> >
>
> Sorry, you're right. The argument vector needs compatibility support.
>
> This means you need an x32 version of the function -- execve
> unfortunately is one of the few system calls which require a special x32
> version (although it's a simple wrapper around sys32_execve). See
> sys_x32_execve.
I *really* strongly object to doing that thing before we sanitize the
situation with sys_execve(). As it is, the damn thing is defined
separately on each architecture, with spectaculary ugly kludges used
in these implementations. Adding a parallel pile of kludges (and
due to their nature, they'll need to be changed in non-trivial
way in a lot of cases) is simply wrong.
The thing is, there's essentially no reason to have more than one
implementation. What they are (badly) doing is "we need to find
pt_regs to pass to do_execve(), the thing we are after has to be near
our stack frame, so let's try to get to it that way". With really
ugly set of kludges trying to do just that.
What we should use instead is task_pt_regs(); maybe introduce
current_pt_regs(), defaulting to task_pt_regs(current) and letting
architectures that can do it better (on some it's simply
available in dedicated register, on some it's better to work
from current_thread_info(), etc.) override the default.
With that we have a fairly good chance to merge most of those
guys; probably not all of them, due to e.g. mips weirdness,
but enough to make it worth doing.
The obstacle is in lazy kernel_execve() implementations;
ones that simply issue a trap/whatever is used to enter
the system call. Directly from kernel space. It doesn't
have to be done that way; see what e.g. arm does there.
Note that doing it without syscall instruction avoids another
headache; namely, we don't have to worry about returning
from *failed* execve (i.e. return to kernel mode) through
the codepath that is normally taken only when returning
to userland.
FWIW, I would try to pull the asm tail of arm kernel_execve()
into something that would look to C side as
ret_from_kernel_exec(®s); /* never returns */
and start converting architectures to that primitive. It should
copy the provided pt_regs to normal location (keeping in mind
that there really might be an overlap), set registers (including
stack pointer) for normal return to user path and jump there.
Essentially, that's the real arch-dependent part of kernel_execve() -
transition from kernel thread to userland process.
It can be done architecture-by-architecture; there's no need to make
it a flagday conversion. Once an arch is handled, we define
something like __ARCH_HAS_RET_FROM_KERNEL_EXEC and get the common
implementations of kernel_execve() and sys_execve() for that -
those could simply live in fs/exec.c under the matching ifdef.
Along with your sys_execveat(). I can probably throw alpha,
arm and x86 conversions into the pile, but it really needs to
be handled on linux-arch, with arch maintainers at least agreeing
in principle with that scheme.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists