linux-kernel - Re: [RFC] status of execve() work - per-architecture patches solicited

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120917032651.GU13973@ZenIV.linux.org.uk>
Date:	Mon, 17 Sep 2012 04:26:51 +0100
From:	Al Viro <viro@...IV.linux.org.uk>
To:	Mark Salter <msalter@...hat.com>
Cc:	linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [RFC] status of execve() work - per-architecture patches
 solicited

On Mon, Sep 10, 2012 at 06:20:01PM -0400, Mark Salter wrote:
> C6X works fine with these patches to switch over to generic code.
> 
> 
> Mark Salter (2):
>   c6x: implement ret_from_kernel_execve() and switch to generic
>     kernel_execve()
>   c6x: switch to generic sys_execve()
> 
>  arch/c6x/include/asm/syscalls.h |    5 ---
>  arch/c6x/include/asm/unistd.h   |    3 ++
>  arch/c6x/kernel/entry.S         |   54 +++++++++++++++++---------------------
>  arch/c6x/kernel/process.c       |   22 ----------------
>  4 files changed, 27 insertions(+), 57 deletions(-)

Applied.  There's an alternative variant of that branch; see
#experimental-kernel_thread in the same tree.  I have *not* attempted
to port those patches over there - I don't have anything to test on
and architecture is too unfamiliar for me to even attempt it blindly.

The main differences between those branches are:
	* ret_from_fork is usually split in two - ret_from_fork
is used for normal processes and ret_from_kernel_thread is its
analog for kernel threads; copy_thread() chooses one to use based
on user_mode(regs).
	* ret_from_kernel_thread does *not* go through the normal
return-from-syscall codepath; instead of doing that it simply
does an equivalent of kernel_thread_helper() itself - i.e. calls
the function we'd passed to kernel_thread(), followed by sys_exit().
	* ret_from_kernel_execve does *not* bother with memmove();
it's done by generic kernel_execve() itself.  Note that the first
two changes guarantee that kernel threads will have pt_regs at the
bottom of their stack, so we won't have any overlaps - not between
the source and destination of copying pt_regs and not between the
stack frame and that destination.  I.e. that copying can safely
be done by generic C implementation of kernel_execve().

I've ported (and tested) execve2 stuff to that model; it's done for
alpha, arm, m68k, s390, powerpc, x86 and um.  I think it's a better
approach:
	* ret_from_kernel_execve() is simpler that way - one argument,
no memmove() call to implement in there.
	* we get to kill the last remnants of "syscall instruction
from the kernel mode" crap (c6x kernel_thread() is free from that
already, but for many architectures it's not so) 
	* syscall return codepath is only taken for return to userland
now; succeeding kernel_thread() is not sharing it.  Seeing that a bunch
of things on that path should be avoided when returning to kernel mode,
that allows for nice optimizations and simpler logics in the asm glue.
	* it removes more code.  BTW, right now the contents of
experimental-kernel_thread + for-next sans execve2 counterparts is
probably getting close to Linus' "it removes 1KLoC, piss on all merge window
rules and pull it now" threshold ;-)

The price is that kernel threads are in the same boat as userland processes
now wrt kernel stack consumption - they get pt_regs in the bottom of kernel
stack, same as for normal syscall path.  That makes for _much_ simpler life,
but if there's a kernel thread with really borderline stack footprint, that
might push it over the edge.  Note, however, that syscalls are where the
worst stack footprints tend to happen and for those we can't get rid of
pt_regs on stack, no matter what we do.

Just as with #execve2 it's not a flagday conversion; however, switching
from one to another probably would be messy, so we'd better decide which
one we'll be doing before the merge window.  Comments?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/