[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DAVG8M70SJ4Q.ZSTC5VSJWGSK@ventanamicro.com>
Date: Wed, 25 Jun 2025 09:51:45 +0200
From: Radim Krčmář <rkrcmar@...tanamicro.com>
To: "Palmer Dabbelt" <palmer@...belt.com>
Cc: <linux-riscv@...ts.infradead.org>, <linux-kernel@...r.kernel.org>, "Paul
Walmsley" <paul.walmsley@...ive.com>, <aou@...s.berkeley.edu>, "Alexandre
Ghiti" <alex@...ti.fr>, "Atish Patra" <atishp@...osinc.com>,
<ajones@...tanamicro.com>, <cleger@...osinc.com>,
<apatel@...tanamicro.com>, <thomas.weissschuh@...utronix.de>,
<david.laight.linux@...il.com>, "Jeff Law" <jlaw@...tanamicro.com>
Subject: Re: [PATCH v2 3/2] RISC-V: sbi: remove sbi_ecall tracepoints
2025-06-24T15:09:09+02:00, Radim Krčmář <rkrcmar@...tanamicro.com>:
> For another example, let's have the following function:
>
> struct sbiret some_sbi_ecall(uintptr_t a0, uintptr_t a1)
> {
> return sbi_ecall(123, 456, a0, a1);
> }
>
> The disassembly without tracepoints (with -fno-omit-frame-pointer):
> (It could have been just "li;li;ecall;ret" without frame pointer.)
>
> 0xffffffff80016d48 <+0>: addi sp,sp,-16
> 0xffffffff80016d4a <+2>: sd ra,8(sp)
> 0xffffffff80016d4c <+4>: sd s0,0(sp)
> 0xffffffff80016d4e <+6>: addi s0,sp,16
> 0xffffffff80016d50 <+8>: li a7,123
> 0xffffffff80016d54 <+12>: li a6,456
> 0xffffffff80016d58 <+16>: ecall
> 0xffffffff80016d5c <+20>: ld ra,8(sp)
> 0xffffffff80016d5e <+22>: ld s0,0(sp)
> 0xffffffff80016d60 <+24>: addi sp,sp,16
> 0xffffffff80016d62 <+26>: ret
>
> [ Removed previous disassembly with tracepoints. ]
> I'll try
> again with GCC 15.1, and get back if it actually improves the situation.
GCC 15.1 still leaves "mv" outside the branch, but at least seems to be
on the right track (undesired overhead is marked with leading stars):
0xffffffff800236e8 <+0>: addi sp,sp,-48
0xffffffff800236ea <+2>: sd s0,32(sp)
0xffffffff800236ec <+4>: sd ra,40(sp)
0xffffffff800236ee <+6>: addi s0,sp,48
* 0xffffffff800236f0 <+8>: mv a4,a0
* 0xffffffff800236f2 <+10>: mv a5,a1
0xffffffff800236f4 <+12>: nop
* 0xffffffff800236f8 <+16>: mv a0,a4
* 0xffffffff800236fa <+18>: mv a1,a5
0xffffffff800236fc <+20>: li a7,123
0xffffffff80023700 <+24>: li a6,456
0xffffffff80023704 <+28>: ecall
* 0xffffffff80023708 <+32>: mv a5,a0
* 0xffffffff8002370a <+34>: mv a2,a1
0xffffffff8002370c <+36>: nop
0xffffffff80023710 <+40>: ld ra,40(sp)
0xffffffff80023712 <+42>: ld s0,32(sp)
* 0xffffffff80023714 <+44>: mv a0,a5
* 0xffffffff80023716 <+46>: mv a1,a2
0xffffffff80023718 <+48>: addi sp,sp,48
0xffffffff8002371a <+50>: ret
[Tracing goes to +126]
I realized I had the environment configured for clang in the last mail,
so here is actual GCC 14.3, which also spills in the prologue:
0xffffffff80023360 <+0>: addi sp,sp,-48
0xffffffff80023362 <+2>: sd s0,32(sp)
* 0xffffffff80023364 <+4>: sd s1,24(sp)
* 0xffffffff80023366 <+6>: sd s2,16(sp)
0xffffffff80023368 <+8>: sd ra,40(sp)
0xffffffff8002336a <+10>: addi s0,sp,48
* 0xffffffff8002336c <+12>: mv s2,a0
* 0xffffffff8002336e <+14>: mv s1,a1
0xffffffff80023370 <+16>: nop
* 0xffffffff80023374 <+20>: mv a0,s2
* 0xffffffff80023376 <+22>: mv a1,s1
0xffffffff80023378 <+24>: li a7,123
0xffffffff8002337c <+28>: li a6,456
0xffffffff80023380 <+32>: ecall
* 0xffffffff80023384 <+36>: mv s2,a0
* 0xffffffff80023386 <+38>: mv s1,a1
0xffffffff80023388 <+40>: nop
0xffffffff8002338c <+44>: ld ra,40(sp)
0xffffffff8002338e <+46>: ld s0,32(sp)
* 0xffffffff80023390 <+48>: mv a0,s2
* 0xffffffff80023392 <+50>: mv a1,s1
* 0xffffffff80023394 <+52>: ld s2,16(sp)
* 0xffffffff80023396 <+54>: ld s1,24(sp)
0xffffffff80023398 <+56>: addi sp,sp,48
0xffffffff8002339a <+58>: ret
[Tracing goes to +108]
And clang in the last mail inlined the tracepoints, because I put the
example function in sbi_ecall.c, which bloated the tracing slowpath, and
spilled one more register than needed.
With the function in sbi.c, to better simulate actual use (gcc examples
are already doing this), clang 20.1.6 and 19.1.7 do:
0xffffffff80016f08 <+0>: addi sp,sp,-32
0xffffffff80016f0a <+2>: sd ra,24(sp)
0xffffffff80016f0c <+4>: sd s0,16(sp)
* 0xffffffff80016f0e <+6>: sd s1,8(sp)
* 0xffffffff80016f10 <+8>: sd s2,0(sp)
0xffffffff80016f12 <+10>: addi s0,sp,32
0xffffffff80016f14 <+12>: nop
0xffffffff80016f18 <+16>: li a7,123
0xffffffff80016f1c <+20>: li a6,456
0xffffffff80016f20 <+24>: ecall
0xffffffff80016f24 <+28>: nop
0xffffffff80016f28 <+32>: ld ra,24(sp)
0xffffffff80016f2a <+34>: ld s0,16(sp)
* 0xffffffff80016f2c <+36>: ld s1,8(sp)
* 0xffffffff80016f2e <+38>: ld s2,0(sp)
0xffffffff80016f30 <+40>: addi sp,sp,32
0xffffffff80016f32 <+42>: ret
[Tracing goes to +94]
When compared to GCC 15.1, clang spills in the prologue, but doesn't
store around the static branch sites. The optimal result would be a
combination of what clang and GCC 15.1 do (code without any stars).
When I looked at real code samples, the behavior was roughly similar.
GCC just wasn't always placing the "mv"s as obviously.
Powered by blists - more mailing lists