lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DAUSD38QIV6D.1YO5ASNI3EUGV@ventanamicro.com>
Date: Tue, 24 Jun 2025 15:09:09 +0200
From: Radim Krčmář <rkrcmar@...tanamicro.com>
To: "Palmer Dabbelt" <palmer@...belt.com>
Cc: <linux-riscv@...ts.infradead.org>, <linux-kernel@...r.kernel.org>, "Paul
 Walmsley" <paul.walmsley@...ive.com>, <aou@...s.berkeley.edu>, "Alexandre
 Ghiti" <alex@...ti.fr>, "Atish Patra" <atishp@...osinc.com>,
 <ajones@...tanamicro.com>, <cleger@...osinc.com>,
 <apatel@...tanamicro.com>, <thomas.weissschuh@...utronix.de>,
 <david.laight.linux@...il.com>
Subject: Re: [PATCH v2 3/2] RISC-V: sbi: remove sbi_ecall tracepoints

2025-06-23T15:54:00-07:00, Palmer Dabbelt <palmer@...belt.com>:
> Having patch 3 of 2 is not normal.

Sorry, I wanted to distinguish it from the original series without
sending a new one, because it's quite radical proposal I don't
necessarily want to get merged.
Would "[RFC 3/2]", "[RFC 3/3]", or something else look better while
raising the same alarms?

> On Thu, 19 Jun 2025 12:03:15 PDT (-0700), rkrcmar@...tanamicro.com wrote:
> So the issue is the extra save/restore on function entry?  That's the 
> sort of think shrink wrapping is supposed to help with.  It's been 
> implemented in GCC for a while, but I'm not sure how well it's been 
> pushed on (IIRC it was just one of the SPEC workloads).

Yes, shrink wrapping could help if compilers can figure out what to do
with static_keys. It's hopefully going to sort itself out in the future.
We'd ideally have some way to tell the compiler to always keep the
tracepoints inside their branches, to make them less fragile, but that
is probably asking too much from C.

I think GCC 15.1 had some shrink-wrapping improvements, but I've only
been using 14.3 so far...

> That said, this is kind of hard to reason about.  Can you pull out a 
> smaller example?

I posted an example of the original 8 argument ecall in v1:
https://lore.kernel.org/linux-riscv/20250612145754.2126147-2-rkrcmar@ventanamicro.com/T/#m1d441ab3de3e6d6b3b8d120b923f2e2081918a98
For another example, let's have the following function:

  struct sbiret some_sbi_ecall(uintptr_t a0, uintptr_t a1)
  {
    return sbi_ecall(123, 456, a0, a1);
  }

The disassembly without tracepoints (with -fno-omit-frame-pointer):
(It could have been just "li;li;ecall;ret" without frame pointer.)

   0xffffffff80016d48 <+0>:	addi	sp,sp,-16
   0xffffffff80016d4a <+2>:	sd	ra,8(sp)
   0xffffffff80016d4c <+4>:	sd	s0,0(sp)
   0xffffffff80016d4e <+6>:	addi	s0,sp,16
   0xffffffff80016d50 <+8>:	li	a7,123
   0xffffffff80016d54 <+12>:	li	a6,456
   0xffffffff80016d58 <+16>:	ecall
   0xffffffff80016d5c <+20>:	ld	ra,8(sp)
   0xffffffff80016d5e <+22>:	ld	s0,0(sp)
   0xffffffff80016d60 <+24>:	addi	sp,sp,16
   0xffffffff80016d62 <+26>:	ret

With tracepoints, the situation is worse... the optimal outcome would
add two nops, but the actual result is:

   0xffffffff80017720 <+0>:	addi	sp,sp,-48
   0xffffffff80017722 <+2>:	sd	ra,40(sp)
   0xffffffff80017724 <+4>:	sd	s0,32(sp)
   0xffffffff80017726 <+6>:	sd	s1,24(sp)
   0xffffffff80017728 <+8>:	sd	s2,16(sp)
   0xffffffff8001772a <+10>:	sd	s3,8(sp)
   0xffffffff8001772c <+12>:	addi	s0,sp,48
   0xffffffff8001772e <+14>:	nop
   0xffffffff80017730 <+16>:	nop
   0xffffffff80017734 <+20>:	li	a7,123
   0xffffffff80017738 <+24>:	li	a6,456
   0xffffffff8001773c <+28>:	ecall
   0xffffffff80017740 <+32>:	nop
   0xffffffff80017744 <+36>:	ld	ra,40(sp)
   0xffffffff80017746 <+38>:	ld	s0,32(sp)
   0xffffffff80017748 <+40>:	ld	s1,24(sp)
   0xffffffff8001774a <+42>:	ld	s2,16(sp)
   0xffffffff8001774c <+44>:	ld	s3,8(sp)
   0xffffffff8001774e <+46>:	addi	sp,sp,48
   0xffffffff80017750 <+48>:	ret
   [Tracing slowpath continues to 202.]

i.e. we spill 3 extra registers, which is at least better v1.  I'll try
again with GCC 15.1, and get back if it actually improves the situation.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ