[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <46adb25b-7b73-4824-a9ca-41617a5c4bca@iscas.ac.cn>
Date: Wed, 27 Aug 2025 15:07:36 +0800
From: Vivian Wang <wangruikang@...as.ac.cn>
To: Yury Norov <yury.norov@...il.com>
Cc: Paul Walmsley <paul.walmsley@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
Alexandre Ghiti <alex@...ti.fr>, Rasmus Villemoes
<linux@...musvillemoes.dk>, Charlie Jenkins <charlie@...osinc.com>,
Xiao Wang <xiao.w.wang@...el.com>,
Christoph Müllner <christoph.muellner@...ll.eu>,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
Vivian Wang <uwu@...m.page>
Subject: Re: [PATCH v2 4/5] riscv: bitops: Use __riscv_has_extension_likely
On 8/22/25 01:46, Vivian Wang wrote:
> [...]
>> Can you please share bloat-o-meter report against this patch? Can you
>> also show an example of code generation before and after? Have you
>> tried the 'unlikely()` one? How the output looks?
> Thanks for the tip on bloat-o-meter. I'll take a look tomorrow.
That "tomorrow" took a while.
This is what it looks like, old being v6.17-rc1 and new being this patch
series.
It's not as identical as I had hoped originally, but I had went into
each plus and a few minuses and confirmed that the actual asm goto part
seems to have been recreated as expected. The rest of the differences
appear to be explainable by unpredictable factors in the compiler (GCC
14.3.0 in my case).
For example, bpf_lru_populate seems to have got worse register
allocation. It uses one more callee-saved register. Moreover, RISC-V
compressed instructions has shorter encodings when used with some
registers, so for example sd a1,32(s1) is encodable as 2 bytes, but sd
a1,32(s2) is only encodable as 4 bytes. This appears to explain the +16
in code size.
As far as I can tell, which is basically me staring at objdump and
seeing "yeah looks normal", all of these are caused by random factors
due to changes in how now we write the control structures:
add/remove: 0/0 grow/shrink: 14/24 up/down: 72/-234 (-162)
Function old new delta
bpf_lru_populate 450 466 +16
spi_nor_scan 3506 3516 +10
wants_mount_setattr 688 696 +8
regulator_irq_map_event_simple 202 208 +6
idling_boosts_thr_without_issues 198 204 +6
trie_lookup_elem 704 708 +4
ethnl_set_tsconfig 1694 1698 +4
dev_xdp_attach 1142 1146 +4
add_mtd_device 1468 1472 +4
xhci_count_num_new_endpoints.isra 104 106 +2
rtl_init_one 4360 4362 +2
queued_read_lock_slowpath 414 416 +2
osq_lock 262 264 +2
cpufreq_dbs_governor_start 520 522 +2
thaw_super_locked 622 620 -2
stop_machine_from_inactive_cpu 372 370 -2
objpool_init 962 960 -2
memweight 168 166 -2
irq_destroy_ipi 248 246 -2
fat_fill_super 3408 3406 -2
create_boot_cache 292 290 -2
snd_soc_dapm_get_volsw 588 584 -4
ip_rcv_core 770 766 -4
ip_mc_check_igmp 736 732 -4
tmigr_quick_check 224 218 -6
nvdimm_security_flags 152 146 -6
inode_switch_wbs_work_fn 1934 1928 -6
sd_uhs2_power_up 176 168 -8
mmc_power_up.part 402 394 -8
__alloc_bucket_spinlocks 190 182 -8
__clk_hw_register_mux 624 612 -12
bfq_bfqq_expire 872 858 -14
perf_prepare_sample 1810 1794 -16
wq_update_node_max_active 308 288 -20
blk_mq_num_queues 94 74 -20
register_pidns_sysctls 248 226 -22
dw8250_setup_port 1212 1182 -30
build_sched_domains 4748 4716 -32
Total: Before=16029885, After=16029723, chg -0.00%
That's all I can figure out. I hope this is satisfactory, to anyone reading.
Vivian "dramforever" Wang
Powered by blists - more mailing lists