lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sun, 11 Feb 2024 22:20:25 +0000
From: patchwork-bot+netdevbpf@...nel.org
To: Marco Elver <elver@...gle.com>
Cc: ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
 martin.lau@...ux.dev, song@...nel.org, yonghong.song@...ux.dev,
 john.fastabend@...il.com, kpsingh@...nel.org, sdf@...gle.com,
 haoluo@...gle.com, jolsa@...nel.org, mykolal@...com, shuah@...nel.org,
 iii@...ux.ibm.com, laoar.shao@...il.com, tj@...nel.org, bpf@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH bpf-next v2] bpf: Allow compiler to inline most of
 bpf_local_storage_lookup()

Hello:

This patch was applied to bpf/bpf-next.git (master)
by Martin KaFai Lau <martin.lau@...nel.org>:

On Wed,  7 Feb 2024 13:26:17 +0100 you wrote:
> In various performance profiles of kernels with BPF programs attached,
> bpf_local_storage_lookup() appears as a significant portion of CPU
> cycles spent. To enable the compiler generate more optimal code, turn
> bpf_local_storage_lookup() into a static inline function, where only the
> cache insertion code path is outlined
> 
> Notably, outlining cache insertion helps avoid bloating callers by
> duplicating setting up calls to raw_spin_{lock,unlock}_irqsave() (on
> architectures which do not inline spin_lock/unlock, such as x86), which
> would cause the compiler produce worse code by deciding to outline
> otherwise inlinable functions. The call overhead is neutral, because we
> make 2 calls either way: either calling raw_spin_lock_irqsave() and
> raw_spin_unlock_irqsave(); or call __bpf_local_storage_insert_cache(),
> which calls raw_spin_lock_irqsave(), followed by a tail-call to
> raw_spin_unlock_irqsave() where the compiler can perform TCO and (in
> optimized uninstrumented builds) turns it into a plain jump. The call to
> __bpf_local_storage_insert_cache() can be elided entirely if
> cacheit_lockit is a false constant expression.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v2] bpf: Allow compiler to inline most of bpf_local_storage_lookup()
    https://git.kernel.org/bpf/bpf-next/c/68bc61c26cac

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ