lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250311114012.GE19424@noisy.programming.kicks-ass.net>
Date: Tue, 11 Mar 2025 12:40:12 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: kan.liang@...ux.intel.com
Cc: mingo@...hat.com, acme@...nel.org, namhyung@...nel.org,
	linux-kernel@...r.kernel.org, ak@...ux.intel.com,
	eranian@...gle.com
Subject: Re: [PATCH] perf: Extend per event callchain limit to branch stack

On Mon, Mar 10, 2025 at 11:15:36AM -0700, kan.liang@...ux.intel.com wrote:
> From: Kan Liang <kan.liang@...ux.intel.com>
> 
> The commit 97c79a38cd45 ("perf core: Per event callchain limit")
> introduced a per-event term to allow finer tuning of the depth of
> callchains to save space.
> 
> It should be applied to the branch stack as well. For example, autoFDO
> collections require maximum LBR entries. In the meantime, other
> system-wide LBR users may only be interested in the latest a few number
> of LBRs. A per-event LBR depth would save the perf output buffer.
> 
> The patch simply drops the uninterested branches, but HW still collects
> the maximum branches. There may be a model-specific optimization that
> can reduce the HW depth for some cases to reduce the overhead further.
> But it isn't included in the patch set. Because it's not useful for all
> cases. For example, ARCH LBR can utilize the PEBS and XSAVE to collect
> LBRs. The depth should have less impact on the collecting overhead.
> The model-specific optimization may be implemented later separately.
> 
> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>

Thanks!

> ---
>  include/linux/perf_event.h      | 3 +++
>  include/uapi/linux/perf_event.h | 2 ++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 24f2eba200ac..bca1dfd30276 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1347,6 +1347,9 @@ static inline void perf_sample_save_brstack(struct perf_sample_data *data,
>  
>  	if (branch_sample_hw_index(event))
>  		size += sizeof(u64);
> +
> +	brs->nr = min_t(u16, event->attr.sample_max_stack, brs->nr);
> +
>  	size += brs->nr * sizeof(struct perf_branch_entry);
>  
>  	/*
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 0524d541d4e3..5fc753c23734 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -385,6 +385,8 @@ enum perf_event_read_format {
>   *
>   * @sample_max_stack: Max number of frame pointers in a callchain,
>   *		      should be < /proc/sys/kernel/perf_event_max_stack
> + *		      Max number of entries of branch stack
> + *		      should be < hardware limit
>   */
>  struct perf_event_attr {
>  
> -- 
> 2.38.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ