[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1259536501.2076.39.camel@pasglop>
Date: Mon, 30 Nov 2009 10:15:01 +1100
From: Benjamin Herrenschmidt <benh@...nel.crashing.org>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Ananth N Mavinakayanahalli <ananth@...ibm.com>,
Veaceslav Falico <vfalico@...hat.com>,
Paul Mackerras <paulus@...ba.org>,
Alexey Dobriyan <adobriyan@...il.com>,
Christoph Hellwig <hch@...radead.org>,
"Frank Ch. Eigler" <fche@...hat.com>, Ingo Molnar <mingo@...e.hu>,
Peter Zijlstra <peterz@...radead.org>,
Roland McGrath <roland@...hat.com>,
linux-kernel@...r.kernel.org, utrace-devel@...hat.com
Subject: Re: powerpc: syscall_dotrace() && retcode (Was: powerpc: fork &&
stepping)
On Sun, 2009-11-29 at 22:07 +0100, Oleg Nesterov wrote:
> On 11/28, Ananth N Mavinakayanahalli wrote:
> >
> > syscall-reset is the only failure I see on
> > powerpc:
> >
> > errno 14 (Bad address)
> > syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location
> > ()) == 38' failed.
> > unexpected child status 67f
> > FAIL: syscall-reset
>
> (to remind, it also fails without utrace)
>
> Once again, I know nothing about powerc, perhaps I misread the code,
> but I believe this test-case is just wrong on powerpc and should be
> fixed.
>
> On powerpc, syscall_get_nr() returns regs->gpr[0], this means this
> register is used to pass the syscall number.
Correct.
> This matches do_syscall_trace_enter(), it returns regs->gpr[0] as a
> (possibly changed by tracer) syscall nr.
>
> arch/powerpc/kernel/entry_64.S does
>
> syscall_dotrace:
>
> bl .do_syscall_trace_enter
> mr r0,r3 // I guess, r3 = r0 ?
r3 is the return value from a function so this replaces the
syscall number
> ...
> b syscall_dotrace_cont
>
> syscall_dotrace_cont:
>
> syscall_dotrace_cont:
>
> cmpldi 0,r0,NR_syscalls
> bge- syscall_enosys
>
> syscall_enosys:
>
> li r3,-ENOSYS
> b syscall_exit
>
>
> Now return to the test-case, syscall-reset.c. The tracee does
> l = syscall (-23, 1, 2, 3) and stops.
>
> The tracer does
>
> #define RETREG offsetof(struct pt_regs, gpr[0])
> #define NEWVAL ((long) ENOTTY)
>
> l = ptrace(PTRACE_PEEKUSER, child, RETREG, 0l);
>
> l == -23, this is correct, note syscall(-23) above.
>
> l = ptrace(PTRACE_POKEUSER, child, RETREG, NEWVAL);
>
> And expects the tracee will see NEWVAL==ENOTTY after return from
> the systame call.
>
> Of course this can't happen. We changed the syscall number, the
> new value is ENOTTY == 25 == __NR_stime, sys_stime() correctly
> returns -EFAULT.
>
> -----------------------------------------------------------------
>
> If I change the test-case to use NEWVAL == 1000 (or any other value
> greater than NR_syscalls), then the tracee sees ENOSYS and this is
> correct too.
>
> But I do not see how it is possible to change the retcode on powerpc.
> Unlike x86, powepc doesn't set -ENOSYS "in advance", before doing
> do_syscall_trace_enter() logic. This means that if the tracer "cancels"
> syscall, r3 will be overwritten by syscall_enosys.
>
> This probably means the kernel should be fixed too, but I am not
> brave enough to change the asm which I can't understand ;)
Yes, the asm should be changed. I suppose we could check if the result
of do_syscall_trace_enter is negative, and if it is, branch to the exit
path using r3 as the error code. Would that be ok ?
Something like this:
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 1175a85..7a88c88 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -419,6 +419,9 @@ syscall_dotrace:
stw r0,_TRAP(r1)
addi r3,r1,STACK_FRAME_OVERHEAD
bl do_syscall_trace_enter
+ cmpwi cr0,r3,0
+ blt ret_from_syscall
+
/*
* Restore argument registers possibly just changed.
* We use the return value of do_syscall_trace_enter
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 9763267..ec709a7 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -240,6 +240,9 @@ syscall_dotrace:
bl .save_nvgprs
addi r3,r1,STACK_FRAME_OVERHEAD
bl .do_syscall_trace_enter
+ cmpdi cr0,r3,0
+ blt syscall_exit
+
/*
* Restore argument registers possibly just changed.
* We use the return value of do_syscall_trace_enter
Cheers,
Ben.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists