[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1259536501.2076.39.camel@pasglop>
Date:	Mon, 30 Nov 2009 10:15:01 +1100
From:	Benjamin Herrenschmidt <benh@...nel.crashing.org>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Ananth N Mavinakayanahalli <ananth@...ibm.com>,
	Veaceslav Falico <vfalico@...hat.com>,
	Paul Mackerras <paulus@...ba.org>,
	Alexey Dobriyan <adobriyan@...il.com>,
	Christoph Hellwig <hch@...radead.org>,
	"Frank Ch. Eigler" <fche@...hat.com>, Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <peterz@...radead.org>,
	Roland McGrath <roland@...hat.com>,
	linux-kernel@...r.kernel.org, utrace-devel@...hat.com
Subject: Re: powerpc: syscall_dotrace() && retcode (Was: powerpc: fork &&
 stepping)
On Sun, 2009-11-29 at 22:07 +0100, Oleg Nesterov wrote:
> On 11/28, Ananth N Mavinakayanahalli wrote:
> >
> > syscall-reset is the only failure I see on
> > powerpc:
> >
> > errno 14 (Bad address)
> > syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location
> > ()) == 38' failed.
> > unexpected child status 67f
> > FAIL: syscall-reset
> 
> (to remind, it also fails without utrace)
> 
> Once again, I know nothing about powerc, perhaps I misread the code,
> but I believe this test-case is just wrong on powerpc and should be
> fixed.
> 
> On powerpc, syscall_get_nr() returns regs->gpr[0], this means this
> register is used to pass the syscall number.
Correct.
> This matches do_syscall_trace_enter(), it returns regs->gpr[0] as a
> (possibly changed by tracer) syscall nr.
> 
> arch/powerpc/kernel/entry_64.S does
> 
> 	syscall_dotrace:
> 
> 		 bl      .do_syscall_trace_enter
> 		 mr      r0,r3	// I guess, r3 = r0 ?
r3 is the return value from a function so this replaces the
syscall number
> 	 ...
> 		 b       syscall_dotrace_cont
> 
> 	syscall_dotrace_cont:
> 
> 		syscall_dotrace_cont:
> 
> 			cmpldi  0,r0,NR_syscalls
> 			bge-    syscall_enosys
> 
> 	syscall_enosys:
> 
> 		li      r3,-ENOSYS
> 		b       syscall_exit
> 
> 
> Now return to the test-case, syscall-reset.c. The tracee does
> l = syscall (-23, 1, 2, 3) and stops.
> 
> The tracer does
> 
> 	#define RETREG	offsetof(struct pt_regs, gpr[0])
> 	#define NEWVAL	((long) ENOTTY)
> 
> 	l = ptrace(PTRACE_PEEKUSER, child, RETREG, 0l);
> 
> l == -23, this is correct, note syscall(-23) above.
> 
> 	l = ptrace(PTRACE_POKEUSER, child, RETREG, NEWVAL);
> 
> And expects the tracee will see NEWVAL==ENOTTY after return from
> the systame call.
> 
> Of course this can't happen. We changed the syscall number, the
> new value is ENOTTY == 25 == __NR_stime, sys_stime() correctly
> returns -EFAULT.
> 
> -----------------------------------------------------------------
> 
> If I change the test-case to use NEWVAL == 1000 (or any other value
> greater than NR_syscalls), then the tracee sees ENOSYS and this is
> correct too.
> 
> But I do not see how it is possible to change the retcode on powerpc.
> Unlike x86, powepc doesn't set -ENOSYS "in advance", before doing
> do_syscall_trace_enter() logic. This means that if the tracer "cancels"
> syscall, r3 will be overwritten by syscall_enosys.
> 
> This probably means the kernel should be fixed too, but I am not
> brave enough to change the asm which I can't understand ;)
Yes, the asm should be changed. I suppose we could check if the result
of do_syscall_trace_enter is negative, and if it is, branch to the exit
path using r3 as the error code. Would that be ok ?
Something like this:
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 1175a85..7a88c88 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -419,6 +419,9 @@ syscall_dotrace:
 	stw	r0,_TRAP(r1)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	do_syscall_trace_enter
+	cmpwi	cr0,r3,0
+	blt	ret_from_syscall
+
 	/*
 	 * Restore argument registers possibly just changed.
 	 * We use the return value of do_syscall_trace_enter
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 9763267..ec709a7 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -240,6 +240,9 @@ syscall_dotrace:
 	bl	.save_nvgprs
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	.do_syscall_trace_enter
+	cmpdi	cr0,r3,0
+	blt	syscall_exit
+
 	/*
 	 * Restore argument registers possibly just changed.
 	 * We use the return value of do_syscall_trace_enter
Cheers,
Ben.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
