lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DAVG8M70SJ4Q.ZSTC5VSJWGSK@ventanamicro.com>
Date: Wed, 25 Jun 2025 09:51:45 +0200
From: Radim Krčmář <rkrcmar@...tanamicro.com>
To: "Palmer Dabbelt" <palmer@...belt.com>
Cc: <linux-riscv@...ts.infradead.org>, <linux-kernel@...r.kernel.org>, "Paul
 Walmsley" <paul.walmsley@...ive.com>, <aou@...s.berkeley.edu>, "Alexandre
 Ghiti" <alex@...ti.fr>, "Atish Patra" <atishp@...osinc.com>,
 <ajones@...tanamicro.com>, <cleger@...osinc.com>,
 <apatel@...tanamicro.com>, <thomas.weissschuh@...utronix.de>,
 <david.laight.linux@...il.com>, "Jeff Law" <jlaw@...tanamicro.com>
Subject: Re: [PATCH v2 3/2] RISC-V: sbi: remove sbi_ecall tracepoints

2025-06-24T15:09:09+02:00, Radim Krčmář <rkrcmar@...tanamicro.com>:
> For another example, let's have the following function:
>
>   struct sbiret some_sbi_ecall(uintptr_t a0, uintptr_t a1)
>   {
>     return sbi_ecall(123, 456, a0, a1);
>   }
>
> The disassembly without tracepoints (with -fno-omit-frame-pointer):
> (It could have been just "li;li;ecall;ret" without frame pointer.)
>
>    0xffffffff80016d48 <+0>:	addi	sp,sp,-16
>    0xffffffff80016d4a <+2>:	sd	ra,8(sp)
>    0xffffffff80016d4c <+4>:	sd	s0,0(sp)
>    0xffffffff80016d4e <+6>:	addi	s0,sp,16
>    0xffffffff80016d50 <+8>:	li	a7,123
>    0xffffffff80016d54 <+12>:	li	a6,456
>    0xffffffff80016d58 <+16>:	ecall
>    0xffffffff80016d5c <+20>:	ld	ra,8(sp)
>    0xffffffff80016d5e <+22>:	ld	s0,0(sp)
>    0xffffffff80016d60 <+24>:	addi	sp,sp,16
>    0xffffffff80016d62 <+26>:	ret
>
> [ Removed previous disassembly with tracepoints. ]
>                                                                I'll try
> again with GCC 15.1, and get back if it actually improves the situation.

GCC 15.1 still leaves "mv" outside the branch, but at least seems to be
on the right track (undesired overhead is marked with leading stars):

   0xffffffff800236e8 <+0>:	addi	sp,sp,-48
   0xffffffff800236ea <+2>:	sd	s0,32(sp)
   0xffffffff800236ec <+4>:	sd	ra,40(sp)
   0xffffffff800236ee <+6>:	addi	s0,sp,48
*  0xffffffff800236f0 <+8>:	mv	a4,a0
*  0xffffffff800236f2 <+10>:	mv	a5,a1
   0xffffffff800236f4 <+12>:	nop
*  0xffffffff800236f8 <+16>:	mv	a0,a4
*  0xffffffff800236fa <+18>:	mv	a1,a5
   0xffffffff800236fc <+20>:	li	a7,123
   0xffffffff80023700 <+24>:	li	a6,456
   0xffffffff80023704 <+28>:	ecall
*  0xffffffff80023708 <+32>:	mv	a5,a0
*  0xffffffff8002370a <+34>:	mv	a2,a1
   0xffffffff8002370c <+36>:	nop
   0xffffffff80023710 <+40>:	ld	ra,40(sp)
   0xffffffff80023712 <+42>:	ld	s0,32(sp)
*  0xffffffff80023714 <+44>:	mv	a0,a5
*  0xffffffff80023716 <+46>:	mv	a1,a2
   0xffffffff80023718 <+48>:	addi	sp,sp,48
   0xffffffff8002371a <+50>:	ret
   [Tracing goes to +126]

I realized I had the environment configured for clang in the last mail,
so here is actual GCC 14.3, which also spills in the prologue:

   0xffffffff80023360 <+0>:	addi	sp,sp,-48
   0xffffffff80023362 <+2>:	sd	s0,32(sp)
*  0xffffffff80023364 <+4>:	sd	s1,24(sp)
*  0xffffffff80023366 <+6>:	sd	s2,16(sp)
   0xffffffff80023368 <+8>:	sd	ra,40(sp)
   0xffffffff8002336a <+10>:	addi	s0,sp,48
*  0xffffffff8002336c <+12>:	mv	s2,a0
*  0xffffffff8002336e <+14>:	mv	s1,a1
   0xffffffff80023370 <+16>:	nop
*  0xffffffff80023374 <+20>:	mv	a0,s2
*  0xffffffff80023376 <+22>:	mv	a1,s1
   0xffffffff80023378 <+24>:	li	a7,123
   0xffffffff8002337c <+28>:	li	a6,456
   0xffffffff80023380 <+32>:	ecall
*  0xffffffff80023384 <+36>:	mv	s2,a0
*  0xffffffff80023386 <+38>:	mv	s1,a1
   0xffffffff80023388 <+40>:	nop
   0xffffffff8002338c <+44>:	ld	ra,40(sp)
   0xffffffff8002338e <+46>:	ld	s0,32(sp)
*  0xffffffff80023390 <+48>:	mv	a0,s2
*  0xffffffff80023392 <+50>:	mv	a1,s1
*  0xffffffff80023394 <+52>:	ld	s2,16(sp)
*  0xffffffff80023396 <+54>:	ld	s1,24(sp)
   0xffffffff80023398 <+56>:	addi	sp,sp,48
   0xffffffff8002339a <+58>:	ret
   [Tracing goes to +108]

And clang in the last mail inlined the tracepoints, because I put the
example function in sbi_ecall.c, which bloated the tracing slowpath, and
spilled one more register than needed.
With the function in sbi.c, to better simulate actual use (gcc examples
are already doing this), clang 20.1.6 and 19.1.7 do:

   0xffffffff80016f08 <+0>:	addi	sp,sp,-32
   0xffffffff80016f0a <+2>:	sd	ra,24(sp)
   0xffffffff80016f0c <+4>:	sd	s0,16(sp)
*  0xffffffff80016f0e <+6>:	sd	s1,8(sp)
*  0xffffffff80016f10 <+8>:	sd	s2,0(sp)
   0xffffffff80016f12 <+10>:	addi	s0,sp,32
   0xffffffff80016f14 <+12>:	nop
   0xffffffff80016f18 <+16>:	li	a7,123
   0xffffffff80016f1c <+20>:	li	a6,456
   0xffffffff80016f20 <+24>:	ecall
   0xffffffff80016f24 <+28>:	nop
   0xffffffff80016f28 <+32>:	ld	ra,24(sp)
   0xffffffff80016f2a <+34>:	ld	s0,16(sp)
*  0xffffffff80016f2c <+36>:	ld	s1,8(sp)
*  0xffffffff80016f2e <+38>:	ld	s2,0(sp)
   0xffffffff80016f30 <+40>:	addi	sp,sp,32
   0xffffffff80016f32 <+42>:	ret
   [Tracing goes to +94]

When compared to GCC 15.1, clang spills in the prologue, but doesn't
store around the static branch sites.  The optimal result would be a
combination of what clang and GCC 15.1 do (code without any stars).

When I looked at real code samples, the behavior was roughly similar.
GCC just wasn't always placing the "mv"s as obviously.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ