[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250311114012.GE19424@noisy.programming.kicks-ass.net>
Date: Tue, 11 Mar 2025 12:40:12 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: kan.liang@...ux.intel.com
Cc: mingo@...hat.com, acme@...nel.org, namhyung@...nel.org,
linux-kernel@...r.kernel.org, ak@...ux.intel.com,
eranian@...gle.com
Subject: Re: [PATCH] perf: Extend per event callchain limit to branch stack
On Mon, Mar 10, 2025 at 11:15:36AM -0700, kan.liang@...ux.intel.com wrote:
> From: Kan Liang <kan.liang@...ux.intel.com>
>
> The commit 97c79a38cd45 ("perf core: Per event callchain limit")
> introduced a per-event term to allow finer tuning of the depth of
> callchains to save space.
>
> It should be applied to the branch stack as well. For example, autoFDO
> collections require maximum LBR entries. In the meantime, other
> system-wide LBR users may only be interested in the latest a few number
> of LBRs. A per-event LBR depth would save the perf output buffer.
>
> The patch simply drops the uninterested branches, but HW still collects
> the maximum branches. There may be a model-specific optimization that
> can reduce the HW depth for some cases to reduce the overhead further.
> But it isn't included in the patch set. Because it's not useful for all
> cases. For example, ARCH LBR can utilize the PEBS and XSAVE to collect
> LBRs. The depth should have less impact on the collecting overhead.
> The model-specific optimization may be implemented later separately.
>
> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
Thanks!
> ---
> include/linux/perf_event.h | 3 +++
> include/uapi/linux/perf_event.h | 2 ++
> 2 files changed, 5 insertions(+)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 24f2eba200ac..bca1dfd30276 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1347,6 +1347,9 @@ static inline void perf_sample_save_brstack(struct perf_sample_data *data,
>
> if (branch_sample_hw_index(event))
> size += sizeof(u64);
> +
> + brs->nr = min_t(u16, event->attr.sample_max_stack, brs->nr);
> +
> size += brs->nr * sizeof(struct perf_branch_entry);
>
> /*
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 0524d541d4e3..5fc753c23734 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -385,6 +385,8 @@ enum perf_event_read_format {
> *
> * @sample_max_stack: Max number of frame pointers in a callchain,
> * should be < /proc/sys/kernel/perf_event_max_stack
> + * Max number of entries of branch stack
> + * should be < hardware limit
> */
> struct perf_event_attr {
>
> --
> 2.38.1
>
Powered by blists - more mailing lists