[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <507E4D40.9050202@snapgear.com>
Date: Wed, 17 Oct 2012 16:16:32 +1000
From: Greg Ungerer <gerg@...pgear.com>
To: Al Viro <viro@...IV.linux.org.uk>
CC: <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
<linux-arch@...r.kernel.org>, David Miller <davem@...emloft.net>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>
Subject: Re: [RFC][CFT][CFReview] execve and kernel_thread unification work
Hi Al,
On 15/10/12 11:30, Al Viro wrote:
> On Mon, Oct 01, 2012 at 10:38:09PM +0100, Al Viro wrote:
>> [with apologies for folks Cc'd, resent due to mis-autoexpanded l-k address
>> on the original posting ;-/ Mea culpa...]
>>
>> There's an interesting ongoing project around kernel_thread() and
>> friends, including execve() variants. I really need help from architecture
>> maintainers on that one; I'd been able to handle (and test) quite a few
>> architectures on my own [alpha, arm, m68k, powerpc, s390, sparc, x86, um]
>> plus two more untested [frv, mn10300]. c6x patches had been supplied by
>> Mark Salter; everything else remains to be done. Right now it's at
>> minus 1.2KLoC, quite a bit of that removed from asm glue and other black magic.
>
> Update:
> * all infrastructure is in mainline now, along with conversion for
> kernel_thread() callbacks to the form that allows really simple model for
> kernel_execve() _without_ flagday changes.
> * #experimental-kernel_thread is gone; this stuff is in for-next
> now.
> * a lot of architecture conversions had been done and some are
> even tested. Currently missing are only 7 - avr32, hexagon, m32r, openrisc,
> score, tile and xtensa. OTOH, a lot are completely untested. I've put
> per-architecture stuff into separate branches and I promise never rebase
> those once arch maintainers will be OK with the stuff in them. IOW, they'll
> be safe to pull into respective architecture trees.
>
> Folks, *please* review the stuff in signal.git#arch-*. All of them are
> completely independent. I'll be glad to get ACKs/fixes/replacements/etc.
I have checked arch-m68k on ColdFire with and without MMU, and it
is all fine. So for those:
Acked-by: Greg Ungerer <gerg@...inux.org>
Regards
Greg
> I've merged some of those into for-next, but that can change at any time -
> it's not final; for-next will be rebased. Obviously, I hope to get to
> the situation when all of those branches (plus currently missing ones)
> get into shape that satisfies architecture maintainers. Once that happens,
> all those branches will be merged into for-next.
>
> I think the model is about final wrt kernel_thread()/kernel_execve()/
> sys_execve(). There's one possible change on top of it, but it's reasonably
> well-isolated from the rest. As it is, the model to aim for is this:
> * select GENERIC_KERNEL_THREAD and GENERIC_KERNEL_EXECVE
> * kill local kernel_thread()/kernel_execve() implementations
> * generic kernel_thread() will call your copy_thread() with
> NULL regs and fn/arg passed in the pair of arguments that are blindly
> passed all the way through to copy_thread() - usp and stack_size resp.
> In such case copy_thread() should arrange for the newborn to be woken
> up in a function that is very similar to ret_from_fork(). The only
> difference is that between the call of schedule_tail() and jumping into
> the "return from syscall" code it should call fn(arg), using the data
> left for it by copy_thread().
> * unlike the previous variant, ret_from_kernel_execve() is not
> needed at all; no need to play longjmp()-like games when kernel_thread()
> callbacks had been taught to return normally all the way out when
> kernel_execve() returns 0; any updates of sp/manipulations of register
> windows/etc. will happen without any magic.
> * provide current_pt_regs() if needed. Default is
> task_pt_regs(current), but you might want to optimize it and unlike
> task_pt_regs() it must work whenever we are in syscall or in a kernel thread.
> task_pt_regs(task), OTOH, is required to work only when task can be
> interrogated by tracer.
> * no more syscalls-from-kernel, which often allows for simplifications
> in the syscall entry/exit logics. I haven't done any of those; up to the
> architecture maintainers.
>
> One thing to keep in mind is that right now on SMP architectures
> there's the third caller of copy_thread(), besides fork()/clone()/vfork()
> (all pass userland pt_regs, with the address being current_pt_regs()) and
> kernel_thread() (pass NULL pt_regs, kthread creation time). It's fork_idle()
> and it passes zero-filled pt_regs. Frankly, I'm not even sure we want to
> call copy_thread() in that case - the stuff set up by it goes nowhere.
> We do that for each possible secondary CPU on SMP and we do *not* expose
> those threads to scheduler. When CPU gets initialized we have the
> secondary bootstrap take that task_struct as current. Its kernel stack,
> thread_info, etc. are set up by said secondary bootstrap, overriding whatever
> copy_thread() has done. Eventually the bootstrap reaches cpu_idle(),
> which is where we schedule away. switch_to() done by schedule() is what
> completes setting the things up; at that point they are ready to be woken
> up - and not in ret_from_fork(), of course.
> For the majority of architectures nothing done by copy_thread() in
> that case is used afterwards, so we might as well stop calling it when
> copy_process() is called by fork_idle(). I know of only one dubious case -
> powerpc sets thread->ksp_limit on copy_thread() and I'm not sure if
> that's get overwritten in secondary bootstrap - the value would be still
> correct and I don't see any obvious places where it would be reassigned
> on that codepath. There might be other cases like that, though. I would
> argue that for this kind of stuff the right place is arch_dup_task_struct(),
> not copy_thread()... Hell knows. Note that we are pretty much hitting
> the random path in copy_thread() in that case - what zeroed pt_regs look
> like to user_regs() is arch-dependent.
>
> This is the possible change I've mentioned above. Not sure; I'd
> really like comments on that one.
>
> Branches in there:
> arch-blackfin - conversion; completely untested
> arch-cris - conversion; completely untested
> arch-h8300 - conversion; completely untested
> arch-microblaze - conversion; completely untested
> arch-sh - conversion; completely untested
> arch-unicore32 - conversion; completely untested
> arch-ia64 - conversion; tested only on ski, which is worth very little
> arch-c6x - followup to mainline; while it's minor, it's pretty much done
> blindly and *really* needs review by maintainer.
> arch-arm - contains heroic fix by rmk and nothing else. Seems to work fine.
> arch-m68k - minor followup to stuff already in mainline; works on aranym
> arch-parisc - mostly the stuff tested by parisc folks + minor followup
> similar to m68k one.
> arch-s390 - minor followup to mainline; works in hercules
> arch-arm64 - patches from maintainer with minor followup folded
> arch-frv - minor followup to mainline, needs testing
> arch-mn10300 - minor followup to mainline, needs testing
> arch-mips - patches from me and Ralf; works on qemu
> arch-sparc - conversions for sparc32 and sparc64, plus the syscall_noerror
> optimization
> arch-powerpc - minor followups to mainline, need review by maintainers
>
> "Completely untested" in the above reads "no promises it even compiles, let
> alone isn't horribly broken". Please, treat that as a possible starting
> point for doing the conversion for arch in question. I might have misread
> the CPU manuals, your switch_to() implementation, etc., or just have been
> temporary insane from digging through dozens of architectures. Hopefully
> temporary, that is...
>
> And folks, for pity sake, do the remaining seven. The merge window is
> over, so...
>
> Al, buggering off to get some VFS work done.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
------------------------------------------------------------------------
Greg Ungerer -- Principal Engineer EMAIL: gerg@...pgear.com
SnapGear Group, McAfee PHONE: +61 7 3435 2888
8 Gardner Close FAX: +61 7 3217 5323
Milton, QLD, 4064, Australia WEB: http://www.SnapGear.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists