lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <46adb25b-7b73-4824-a9ca-41617a5c4bca@iscas.ac.cn>
Date: Wed, 27 Aug 2025 15:07:36 +0800
From: Vivian Wang <wangruikang@...as.ac.cn>
To: Yury Norov <yury.norov@...il.com>
Cc: Paul Walmsley <paul.walmsley@...ive.com>,
 Palmer Dabbelt <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
 Alexandre Ghiti <alex@...ti.fr>, Rasmus Villemoes
 <linux@...musvillemoes.dk>, Charlie Jenkins <charlie@...osinc.com>,
 Xiao Wang <xiao.w.wang@...el.com>,
 Christoph Müllner <christoph.muellner@...ll.eu>,
 linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
 Vivian Wang <uwu@...m.page>
Subject: Re: [PATCH v2 4/5] riscv: bitops: Use __riscv_has_extension_likely

On 8/22/25 01:46, Vivian Wang wrote:

> [...]
>> Can you please share bloat-o-meter report against this patch? Can you
>> also show an example of code generation before and after? Have you
>> tried the 'unlikely()` one? How the output looks?
> Thanks for the tip on bloat-o-meter. I'll take a look tomorrow.

That "tomorrow" took a while.

This is what it looks like, old being v6.17-rc1 and new being this patch
series.

It's not as identical as I had hoped originally, but I had went into
each plus and a few minuses and confirmed that the actual asm goto part
seems to have been recreated as expected. The rest of the differences
appear to be explainable by unpredictable factors in the compiler (GCC
14.3.0 in my case).

For example, bpf_lru_populate seems to have got worse register
allocation. It uses one more callee-saved register. Moreover, RISC-V
compressed instructions has shorter encodings when used with some
registers, so for example sd a1,32(s1) is encodable as 2 bytes, but sd
a1,32(s2) is only encodable as 4 bytes. This appears to explain the +16
in code size.

As far as I can tell, which is basically me staring at objdump and
seeing "yeah looks normal", all of these are caused by random factors
due to changes in how now we write the control structures:

add/remove: 0/0 grow/shrink: 14/24 up/down: 72/-234 (-162)
Function                                     old     new   delta
bpf_lru_populate                             450     466     +16
spi_nor_scan                                3506    3516     +10
wants_mount_setattr                          688     696      +8
regulator_irq_map_event_simple               202     208      +6
idling_boosts_thr_without_issues             198     204      +6
trie_lookup_elem                             704     708      +4
ethnl_set_tsconfig                          1694    1698      +4
dev_xdp_attach                              1142    1146      +4
add_mtd_device                              1468    1472      +4
xhci_count_num_new_endpoints.isra            104     106      +2
rtl_init_one                                4360    4362      +2
queued_read_lock_slowpath                    414     416      +2
osq_lock                                     262     264      +2
cpufreq_dbs_governor_start                   520     522      +2
thaw_super_locked                            622     620      -2
stop_machine_from_inactive_cpu               372     370      -2
objpool_init                                 962     960      -2
memweight                                    168     166      -2
irq_destroy_ipi                              248     246      -2
fat_fill_super                              3408    3406      -2
create_boot_cache                            292     290      -2
snd_soc_dapm_get_volsw                       588     584      -4
ip_rcv_core                                  770     766      -4
ip_mc_check_igmp                             736     732      -4
tmigr_quick_check                            224     218      -6
nvdimm_security_flags                        152     146      -6
inode_switch_wbs_work_fn                    1934    1928      -6
sd_uhs2_power_up                             176     168      -8
mmc_power_up.part                            402     394      -8
__alloc_bucket_spinlocks                     190     182      -8
__clk_hw_register_mux                        624     612     -12
bfq_bfqq_expire                              872     858     -14
perf_prepare_sample                         1810    1794     -16
wq_update_node_max_active                    308     288     -20
blk_mq_num_queues                             94      74     -20
register_pidns_sysctls                       248     226     -22
dw8250_setup_port                           1212    1182     -30
build_sched_domains                         4748    4716     -32
Total: Before=16029885, After=16029723, chg -0.00%

That's all I can figure out. I hope this is satisfactory, to anyone reading.

Vivian "dramforever" Wang


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ