[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DAUSD38QIV6D.1YO5ASNI3EUGV@ventanamicro.com>
Date: Tue, 24 Jun 2025 15:09:09 +0200
From: Radim Krčmář <rkrcmar@...tanamicro.com>
To: "Palmer Dabbelt" <palmer@...belt.com>
Cc: <linux-riscv@...ts.infradead.org>, <linux-kernel@...r.kernel.org>, "Paul
Walmsley" <paul.walmsley@...ive.com>, <aou@...s.berkeley.edu>, "Alexandre
Ghiti" <alex@...ti.fr>, "Atish Patra" <atishp@...osinc.com>,
<ajones@...tanamicro.com>, <cleger@...osinc.com>,
<apatel@...tanamicro.com>, <thomas.weissschuh@...utronix.de>,
<david.laight.linux@...il.com>
Subject: Re: [PATCH v2 3/2] RISC-V: sbi: remove sbi_ecall tracepoints
2025-06-23T15:54:00-07:00, Palmer Dabbelt <palmer@...belt.com>:
> Having patch 3 of 2 is not normal.
Sorry, I wanted to distinguish it from the original series without
sending a new one, because it's quite radical proposal I don't
necessarily want to get merged.
Would "[RFC 3/2]", "[RFC 3/3]", or something else look better while
raising the same alarms?
> On Thu, 19 Jun 2025 12:03:15 PDT (-0700), rkrcmar@...tanamicro.com wrote:
> So the issue is the extra save/restore on function entry? That's the
> sort of think shrink wrapping is supposed to help with. It's been
> implemented in GCC for a while, but I'm not sure how well it's been
> pushed on (IIRC it was just one of the SPEC workloads).
Yes, shrink wrapping could help if compilers can figure out what to do
with static_keys. It's hopefully going to sort itself out in the future.
We'd ideally have some way to tell the compiler to always keep the
tracepoints inside their branches, to make them less fragile, but that
is probably asking too much from C.
I think GCC 15.1 had some shrink-wrapping improvements, but I've only
been using 14.3 so far...
> That said, this is kind of hard to reason about. Can you pull out a
> smaller example?
I posted an example of the original 8 argument ecall in v1:
https://lore.kernel.org/linux-riscv/20250612145754.2126147-2-rkrcmar@ventanamicro.com/T/#m1d441ab3de3e6d6b3b8d120b923f2e2081918a98
For another example, let's have the following function:
struct sbiret some_sbi_ecall(uintptr_t a0, uintptr_t a1)
{
return sbi_ecall(123, 456, a0, a1);
}
The disassembly without tracepoints (with -fno-omit-frame-pointer):
(It could have been just "li;li;ecall;ret" without frame pointer.)
0xffffffff80016d48 <+0>: addi sp,sp,-16
0xffffffff80016d4a <+2>: sd ra,8(sp)
0xffffffff80016d4c <+4>: sd s0,0(sp)
0xffffffff80016d4e <+6>: addi s0,sp,16
0xffffffff80016d50 <+8>: li a7,123
0xffffffff80016d54 <+12>: li a6,456
0xffffffff80016d58 <+16>: ecall
0xffffffff80016d5c <+20>: ld ra,8(sp)
0xffffffff80016d5e <+22>: ld s0,0(sp)
0xffffffff80016d60 <+24>: addi sp,sp,16
0xffffffff80016d62 <+26>: ret
With tracepoints, the situation is worse... the optimal outcome would
add two nops, but the actual result is:
0xffffffff80017720 <+0>: addi sp,sp,-48
0xffffffff80017722 <+2>: sd ra,40(sp)
0xffffffff80017724 <+4>: sd s0,32(sp)
0xffffffff80017726 <+6>: sd s1,24(sp)
0xffffffff80017728 <+8>: sd s2,16(sp)
0xffffffff8001772a <+10>: sd s3,8(sp)
0xffffffff8001772c <+12>: addi s0,sp,48
0xffffffff8001772e <+14>: nop
0xffffffff80017730 <+16>: nop
0xffffffff80017734 <+20>: li a7,123
0xffffffff80017738 <+24>: li a6,456
0xffffffff8001773c <+28>: ecall
0xffffffff80017740 <+32>: nop
0xffffffff80017744 <+36>: ld ra,40(sp)
0xffffffff80017746 <+38>: ld s0,32(sp)
0xffffffff80017748 <+40>: ld s1,24(sp)
0xffffffff8001774a <+42>: ld s2,16(sp)
0xffffffff8001774c <+44>: ld s3,8(sp)
0xffffffff8001774e <+46>: addi sp,sp,48
0xffffffff80017750 <+48>: ret
[Tracing slowpath continues to 202.]
i.e. we spill 3 extra registers, which is at least better v1. I'll try
again with GCC 15.1, and get back if it actually improves the situation.
Powered by blists - more mailing lists