[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86079b5c-e124-489b-8136-05ae5700cb61@csgroup.eu>
Date: Thu, 23 Jan 2025 23:07:21 +0100
From: Christophe Leroy <christophe.leroy@...roup.eu>
To: "Dmitry V. Levin" <ldv@...ace.io>
Cc: Alexey Gladkov <legion@...nel.org>, Oleg Nesterov <oleg@...hat.com>,
Michael Ellerman <mpe@...erman.id.au>,
Eugene Syromyatnikov <evgsyr@...il.com>, Mike Frysinger <vapier@...too.org>,
Renzo Davoli <renzo@...unibo.it>, Davide Berardi <berardi.dav@...il.com>,
strace-devel@...ts.strace.io, Madhavan Srinivasan <maddy@...ux.ibm.com>,
Nicholas Piggin <npiggin@...il.com>, Naveen N Rao <naveen@...nel.org>,
linuxppc-dev@...ts.ozlabs.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/7] powerpc: properly negate error in
syscall_set_return_value()
Le 23/01/2025 à 19:28, Dmitry V. Levin a écrit :
> On Mon, Jan 20, 2025 at 02:51:38PM +0100, Christophe Leroy wrote:
>> Le 14/01/2025 à 18:04, Dmitry V. Levin a écrit :
>>> On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
>>>> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
>>>>> Bring syscall_set_return_value() in sync with syscall_get_error(),
>>>>> and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
>>>>>
>>>>> This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
>>>>> syscall_set_return_value()").
>>>>
>>>> There is a clear detailed explanation in that commit of why it needs to
>>>> be done.
>>>>
>>>> If you think that commit is wrong you have to explain why with at least
>>>> the same level of details.
>>>
>>> OK, please have a look whether this explanation is clear and detailed enough:
>>>
>>> =======
>>> powerpc: properly negate error in syscall_set_return_value()
>>>
>>> When syscall_set_return_value() is used to set an error code, the caller
>>> specifies it as a negative value in -ERRORCODE form.
>>>
>>> In !trap_is_scv case the error code is traditionally stored as follows:
>>> gpr[3] contains a positive ERRORCODE, and ccr has 0x10000000 flag set.
>>> Here are a few examples to illustrate this convention. The first one
>>> is from syscall_get_error():
>>> /*
>>> * If the system call failed,
>>> * regs->gpr[3] contains a positive ERRORCODE.
>>> */
>>> return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0;
>>>
>>> The second example is from regs_return_value():
>>> if (is_syscall_success(regs))
>>> return regs->gpr[3];
>>> else
>>> return -regs->gpr[3];
>>>
>>> The third example is from check_syscall_restart():
>>> regs->result = -EINTR;
>>> regs->gpr[3] = EINTR;
>>> regs->ccr |= 0x10000000;
>>>
>>> Compared with these examples, the failure of syscall_set_return_value()
>>> to assign a positive ERRORCODE into regs->gpr[3] is clearly visible:
>>> /*
>>> * In the general case it's not obvious that we must deal with
>>> * CCR here, as the syscall exit path will also do that for us.
>>> * However there are some places, eg. the signal code, which
>>> * check ccr to decide if the value in r3 is actually an error.
>>> */
>>> if (error) {
>>> regs->ccr |= 0x10000000L;
>>> regs->gpr[3] = error;
>>> } else {
>>> regs->ccr &= ~0x10000000L;
>>> regs->gpr[3] = val;
>>> }
>>>
>>> This fix brings syscall_set_return_value() in sync with syscall_get_error()
>>> and lets upcoming ptrace/set_syscall_info selftest pass on powerpc.
>>>
>>> Fixes: 1b1a3702a65c ("powerpc: Don't negate error in syscall_set_return_value()").
>>> =======
>>
>> I think there is still something going wrong.
>>
>> do_seccomp() sets regs->gpr[3] = -ENOSYS; by default.
>>
>> Then it calls __secure_computing() which returns what __seccomp_filter()
>> returns.
>>
>> In case of error, __seccomp_filter() calls syscall_set_return_value()
>> with a negative value then returns -1
>>
>> do_seccomp() is called by do_syscall_trace_enter() which returns -1 when
>> do_seccomp() doesn't return 0.
>>
>> do_syscall_trace_enter() is called by system_call_exception() and
>> returns -1, so syscall_exception() returns regs->gpr[3]
>>
>> In entry_32.S, transfer_to_syscall, syscall_exit_prepare() is then
>> called with the return of syscall_exception() as first parameter, which
>> leads to:
>>
>> if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && is_not_scv) {
>> if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
>> r3 = -r3;
>> regs->ccr |= 0x10000000; /* Set SO bit in CR */
>> }
>> }
>>
>> By chance, because you have already changed the sign of gpr[3], the
>> above test fails and nothing is done to r3, and because you have also
>> already set regs->ccr it works.
>>
>> But all this looks inconsistent with the fact that do_seccomp sets
>> -ENOSYS as default value
>>
>> Also, when do_seccomp() returns 0, do_syscall_trace_enter() check the
>> syscall number and when it is wrong it goes to skip: which sets
>> regs->gpr[3] = -ENOSYS;
>>
>> So really I think it is not in line with your changes to set positive
>> value in gpr[3].
>>
>> Maybe your change is still correct but it needs to be handled completely
>> in that case.
>
> Indeed, there is an inconsistency in !trap_is_scv case.
>
> In some places such as syscall_get_error() and regs_return_value() the
> semantics is as I described earlier: gpr[3] contains a positive ERRORCODE
> and ccr has 0x10000000 flag set. This semantics is a part of the ABI and
> therefore cannot be changed.
>
> In some other places like do_seccomp() and do_syscall_trace_enter() the
> semantics is similar to the trap_is_scv case: gpr[3] contains a negative
> ERRORCODE and ccr is unchanged. In addition, system_call_exception()
> returns the system call function return value when it is executed, and
> gpr[3] otherwise. The value returned by system_call_exception() is passed
> on to syscall_exit_prepare() which performs the conversion you mentioned.
>
> What's remarkable is that in those places that are a part of the ABI the
> traditional semantics is kept, while in other places the implementation
> follows the trap_is_scv-like semantics, while traditional semantics is
> also supported there.
>
> The only case where I see some intersection is do_seccomp() where the
> tracer would be able to see -ENOSYS in gpr[3]. However, the seccomp stop
> is not the place where the tracer *reads* the system call exit status,
> so whatever was written in gpr[3] before __secure_computing() is not
> really relevant, consequently, selftests/seccomp/seccomp_bpf passes with
> this patch applied as well as without it.
>
> After looking at system_call_exception() I doubt this inconsistency can be
> easily avoided, so I don't see how this patch could be enhanced further,
> and what else could I do with the patch besides dropping it and letting
> !trap_is_scv case be unsupported by PTRACE_SET_SYSCALL_INFO API, which
> would be unfortunate.
>
>
To add a bit more to the confusion, a task can be flagged with
TIF_NOERROR by calling force_successful_syscall_return(), in which case
even if gpr[3] contains a negative between -MAX_ERRNO and -1 the syscall
will be handled as successfull hence CCR[SO] won't be set. But it seems
this is not handled by syscall_set_return_value(). So what will happen
with time() when approaching year 2036 for instance ?
Powered by blists - more mailing lists