lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 27 Jun 2024 11:43:03 +0200
From: Björn Töpel <bjorn@...osinc.com>
To: Celeste Liu <coelacanthushex@...il.com>
Cc: "Dmitry V. Levin" <ldv@...ace.io>, Palmer Dabbelt <palmer@...osinc.com>, 
	Paul Walmsley <paul.walmsley@...ive.com>, Albert Ou <aou@...s.berkeley.edu>, 
	Guo Ren <guoren@...nel.org>, Conor Dooley <conor.dooley@...rochip.com>, 
	linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org, 
	Andreas Schwab <schwab@...e.de>, David Laight <David.Laight@...lab.com>, 
	Felix Yan <felixonmars@...hlinux.org>, Ruizhe Pan <c141028@...il.com>, 
	Shiqi Zhang <shiqi@...c.iscas.ac.cn>, 
	Emil Renner Berthing <emil.renner.berthing@...onical.com>, "Ivan A. Melnikov" <iv@...linux.org>
Subject: Re: [PATCH v5] riscv: entry: set a0 = -ENOSYS only when syscall != -1

On Thu, Jun 27, 2024 at 9:47 AM Celeste Liu <coelacanthushex@...il.com> wrote:
>
> On 2024-06-27 15:14, Dmitry V. Levin wrote:
>
> > Hi,
> >
> > On Tue, Aug 01, 2023 at 10:15:16PM +0800, Celeste Liu wrote:
> >> When we test seccomp with 6.4 kernel, we found errno has wrong value.
> >> If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf50586d, we will
> >> get ENOSYS instead. We got same result with commit 9c2598d43510 ("riscv:
> >> entry: Save a0 prior syscall_enter_from_user_mode()").
> >>
> >> After analysing code, we think that regs->a0 = -ENOSYS should only be
> >> executed when syscall != -1. In __seccomp_filter, when seccomp rejected
> >> this syscall with specified errno, they will set a0 to return number as
> >> syscall ABI, and then return -1. This return number is finally pass as
> >> return number of syscall_enter_from_user_mode, and then is compared with
> >> NR_syscalls after converted to ulong (so it will be ULONG_MAX). The
> >> condition syscall < NR_syscalls will always be false, so regs->a0 = -ENOSYS
> >> is always executed. It covered a0 set by seccomp, so we always get
> >> ENOSYS when match seccomp RET_ERRNO rule.
> >>
> >> Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry")
> >> Reported-by: Felix Yan <felixonmars@...hlinux.org>
> >> Co-developed-by: Ruizhe Pan <c141028@...il.com>
> >> Signed-off-by: Ruizhe Pan <c141028@...il.com>
> >> Co-developed-by: Shiqi Zhang <shiqi@...c.iscas.ac.cn>
> >> Signed-off-by: Shiqi Zhang <shiqi@...c.iscas.ac.cn>
> >> Signed-off-by: Celeste Liu <CoelacanthusHex@...il.com>
> >> Tested-by: Felix Yan <felixonmars@...hlinux.org>
> >> Tested-by: Emil Renner Berthing <emil.renner.berthing@...onical.com>
> >> Reviewed-by: Björn Töpel <bjorn@...osinc.com>
> >> Reviewed-by: Guo Ren <guoren@...nel.org>
> >> ---
> >>
> >> v4 -> v5: add Tested-by Emil Renner Berthing <emil.renner.berthing@...onical.com>
> >> v3 -> v4: use long instead of ulong to reduce type cast and avoid
> >>           implementation-defined behavior, and make the judgment of syscall
> >>           invalid more explicit
> >> v2 -> v3: use if-statement instead of set default value,
> >>           clarify the type of syscall
> >> v1 -> v2: added explanation on why always got ENOSYS
> >>
> >>  arch/riscv/kernel/traps.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> >> index f910dfccbf5d2..729f79c97e2bf 100644
> >> --- a/arch/riscv/kernel/traps.c
> >> +++ b/arch/riscv/kernel/traps.c
> >> @@ -297,7 +297,7 @@ asmlinkage __visible __trap_section void do_trap_break(struct pt_regs *regs)
> >>  asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
> >>  {
> >>      if (user_mode(regs)) {
> >> -            ulong syscall = regs->a7;
> >> +            long syscall = regs->a7;
> >>
> >>              regs->epc += 4;
> >>              regs->orig_a0 = regs->a0;
> >> @@ -306,9 +306,9 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
> >>
> >>              syscall = syscall_enter_from_user_mode(regs, syscall);
> >>
> >> -            if (syscall < NR_syscalls)
> >> +            if (syscall >= 0 && syscall < NR_syscalls)
> >>                      syscall_handler(regs, syscall);
> >> -            else
> >> +            else if (syscall != -1)
> >>                      regs->a0 = -ENOSYS;
> >>
> >>              syscall_exit_to_user_mode(regs);
> >
> > Unfortunately, this change introduced a regression: it broke strace
> > syscall tampering on riscv.  When the tracer changes syscall number to -1,
> > the kernel fails to initialize a0 with -ENOSYS and subsequently fails to
> > return the error code of the failed syscall to userspace.
>
> In the patch v2, we actually do the right thing. But as Björn Töpel's
> suggestion and we found cast long to ulong is implementation-defined
> behavior in C, so we change it to current form. So revert this patch and
> apply patch v2 should fix this issue. Patch v2 uses ths same way with
> other architectures.
>
> [1]: https://lore.kernel.org/all/20230718162940.226118-1-CoelacanthusHex@gmail.com/

Not reverting, but a fix to make sure that a0 is initialized to -ENOSYS, e.g.:

--8<--
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 05a16b1f0aee..51ebfd23e007 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -319,6 +319,7 @@ void do_trap_ecall_u(struct pt_regs *regs)

  regs->epc += 4;
  regs->orig_a0 = regs->a0;
+ regs->a0 = -ENOSYS;

  riscv_v_vstate_discard(regs);

@@ -328,8 +329,7 @@ void do_trap_ecall_u(struct pt_regs *regs)

  if (syscall >= 0 && syscall < NR_syscalls)
  syscall_handler(regs, syscall);
- else if (syscall != -1)
- regs->a0 = -ENOSYS;
+
  /*
  * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
  * so the maximum stack offset is 1k bytes (10 bits).
--8<--

Celeste, do you want to cook that fix properly?


Björn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ