lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1a481785-a874-4b13-8f1d-f7a1f233da50@gmail.com>
Date: Thu, 17 Oct 2024 01:56:52 +0800
From: Celeste Liu <coelacanthushex@...il.com>
To: Alexandre Ghiti <alex@...ti.fr>, linux-riscv@...ts.infradead.org,
 Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt
 <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
 Oleg Nesterov <oleg@...hat.com>, "Dmitry V. Levin" <ldv@...ace.io>
Cc: Andrea Bolognani <abologna@...hat.com>, WANG Xuerui <git@...0n.name>,
 Jiaxun Yang <jiaxun.yang@...goat.com>, Huacai Chen <chenhuacai@...nel.org>,
 Felix Yan <felixonmars@...hlinux.org>, Ruizhe Pan <c141028@...il.com>,
 Shiqi Zhang <shiqi@...c.iscas.ac.cn>, Guo Ren <guoren@...nel.org>,
 Yao Zi <ziyao@...root.org>, Yangyu Chen <cyy@...self.name>,
 Han Gao <gaohan@...as.ac.cn>, linux-kernel@...r.kernel.org,
 rsworktech@...look.com
Subject: Re: [RFC] riscv/entry: issue about a0/orig_a0 register and ENOSYS

On 2024-10-16 21:07, Celeste Liu wrote:

> 
> On 2024-10-16 20:56, Alexandre Ghiti wrote:
>> On 16/10/2024 14:23, Celeste Liu wrote:
>>> On 2024-10-16 20:00, Alexandre Ghiti wrote:
>>>> Hi Celeste,
>>>>
>>>> Thank you for looking into this and really sorry about the late response.
>>>>
>>>> On 17/09/2024 06:09, Celeste Liu wrote:
>>>>> Before PTRACE_GET_SYSCALL_INFO was implemented in v5.3, the only way to
>>>>> get syscall arguments was to get user_regs_struct via PTRACE_GETREGSET.
>>>>> On some architectures where a register is used as both the first
>>>>> argument and the return value and thus will be changed at some stage of
>>>>> the syscall process, something like orig_a0 is provided to save the
>>>>> original argument register value. But RISC-V doesn't export orig_a0 in
>>>>> user_regs_struct (This ABI was designed at e2c0cdfba7f6 ("RISC-V:
>>>>> User-facing API").) so userspace application like strace will get the
>>>>> right or wrong result depends on the operation order in do_trap_ecall_u()
>>>>> function.
>>>>>
>>>>> This requires we put the ENOSYS process after syscall_enter_from_user_mode()
>>>>> or syscall_handler()[1]. Unfortunately, the generic entry API
>>>>> syscall_enter_from_user_mode() requires we
>>>>>
>>>>> *  process ENOSYS before syscall_enter_from_user_mode()
>>>>
>>>> Where does this requirement come from?
>>>>
>>>>
>>>>> *  or only set a0 to ENOSYS when the return value of
>>>>>      syscall_enter_from_user_mode() != -1
>>>>>
>>>>> Again, if we choose the latter way to avoid conflict with the first
>>>>> issue, we will meet the third problem: strace depends on that kernel
>>>>> will return ENOSYS when syscall nr is -1 to implement their syscall
>>>>> tampering feature[2].
>>>>
>>>> IIUC, seccomp and others in syscall_enter_from_user_mode() could return -1 and then we could not differentiate with the syscall number being -1.
>>>>
>>>> But could we imagine, to distinguish between an error and the syscall number being -1, checking again the syscall number after we call syscall_enter_from_user_mode()? If the syscall number is -1, then we set ENOSYS otherwise we don't do anything (a bit like what you did in 52449c17bdd1 ("riscv: entry: set a0 = -ENOSYS only when syscall != -1")).
>>>>
>>>> Let me know if I completely misunderstood here!
>>> Yeah. I found this a bit later after I post this RFC. I include it in a update reply,
>>> copy here as well:
>>>
>>>> But from another angle, syscall number is in a7 register, so we can call the
>>>> get_syscall_nr() after calling the syscall_enter_from_user_mode() to bypass the
>>>> information lost of the return value of the syscall_enter_from_user_mode(). But
>>>> in this way, the syscall number in the syscall_enter_from_user_mode() return
>>>> value is useless, and we can remove it directly.
>>> So if we get syscall number from a7 register again, the syscall number part of
>>> the return value of syscall_enter_from_user_mode() is useless completely.
>>> I think it's better to drop it so the later new architecture developer will not
>>> run into the same issue. (Actually, the syscall number returned by
>>> syscall_enter_from_user_mode() is also the result of get_syscall_nr() at the end
>>> of it.) But it will affect other architecture's code so I think there still need
>>> some discussions.
>>>
>>> Or if you think it's better to post a patch and then discuss in patch thread
>>> directly, I'm glad to do this.
>>
>>
>> Great that we have a solution that does not need to change the ABI :)
>>
>> I think we should start by implementing a fix for riscv only that implements the get_syscall_nr() after syscall_enter_from_user_mode() so that we can merge that in 6.12-rcX.
>>
>> And after that, you could come with the nicer solution you propose.
>>
>> Do you think you can send a patch for the quick fix soon? In the meantime, I'm adding the strace testsuite to my CI to make sure it works and we don't break it again :)
> 
> Ok. I will send patch soon.

Patch has been sent. See https://lore.kernel.org/all/20241017-fix-riscv-syscall-nr-v1-1-4edb4ca07f07@gmail.com/T/

> 
>>
>> Thanks!
>>
>> Alex
>>
>>
>>>
>>>> Thanks again for the thorough explanation,
>>>>
>>>> Alex
>>>>
>>>>
>>>>> Actually, we tried the both ways in 52449c17bdd1 ("riscv: entry: set
>>>>> a0 = -ENOSYS only when syscall != -1") and 61119394631f ("riscv: entry:
>>>>> always initialize regs->a0 to -ENOSYS") before.
>>>>>
>>>>> Naturally, there is a solution:
>>>>>
>>>>> 1. Just add orig_a0 in user_regs_struct and let strace use it as
>>>>>      loongarch does. So only two problems, which can be resolved without
>>>>>      conflict, are left here.
>>>>>
>>>>> The conflicts are the direct result of the limitation of generic entry
>>>>> API, so we have another two solutions:
>>>>>
>>>>> 2. Give up the generic entry API, and switch back to the
>>>>>      architecture-specific standardalone implementation.
>>>>> 3. Redesign the generic entry API: the problem was caused by
>>>>>      syscall_enter_from_user_mode() using the value -1 (which is unused
>>>>>      normally) of syscall nr to inform syscall was reject by seccomp/bpf.
>>>>>
>>>>> In theory, the Solution 1 is best:
>>>>>
>>>>> *  a0 was used for two purposes in ABI, so using two variables to store
>>>>>      it is natural.
>>>>> *  Userspace can implement features without depending on the internal
>>>>>      behavior of the kernel.
>>>>>
>>>>> Unfortunately, it's difficult to implement based on the current code.
>>>>> The RISC-V defined struct pt_regs as below:
>>>>>
>>>>>           struct pt_regs {
>>>>>                   unsigned long epc;
>>>>>           ...
>>>>>                   unsigned long t6;
>>>>>                   /* Supervisor/Machine CSRs */
>>>>>                   unsigned long status;
>>>>>                   unsigned long badaddr;
>>>>>                   unsigned long cause;
>>>>>                   /* a0 value before the syscall */
>>>>>                   unsigned long orig_a0;
>>>>>           };
>>>>>
>>>>> And user_regs_struct needs to be a prefix of struct pt_regs, so if we
>>>>> want to include orig_a0 in user_regs_struct, we will need to include
>>>>> Supervisor/Machine CSRs as well. It's not a big problem. Since
>>>>> struct pt_regs is the internal ABI of the kernel, we can reorder it.
>>>>> Unfortunately, struct user_regs_struct is defined as below:
>>>>>
>>>>>           struct user_regs_struct {
>>>>>                   unsigned long pc;
>>>>>           ...
>>>>>                   unsigned long t6;
>>>>>           };
>>>>>
>>>>> It doesn't contain something like reserved[] as padding to leave the
>>>>> space to add more registers from struct pt_regs!
>>>>> The loongarch do the right thing as below:
>>>>>
>>>>>           struct user_pt_regs {
>>>>>                   /* Main processor registers. */
>>>>>                   unsigned long regs[32];
>>>>>           ...
>>>>>                   unsigned long reserved[10];
>>>>>           } __attribute__((aligned(8)));
>>>>>
>>>>> RISC-V can't include orig_a0 in user_regs_struct without breaking UABI.
>>>>>
>>>>> Need a discussion to decide to use which solution, or is there any
>>>>> other better solution?
>>>>>
>>>>> [1]: https://github.com/strace/strace/issues/315
>>>>> [2]: https://lore.kernel.org/linux-riscv/20240627071422.GA2626@altlinux.org/
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> linux-riscv mailing list
>>>>> linux-riscv@...ts.infradead.org
>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ