[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48640298-effa-42d4-9137-a18a51637f03@linaro.org>
Date: Tue, 22 Apr 2025 11:21:28 +0100
From: James Clark <james.clark@...aro.org>
To: Yabin Cui <yabinc@...gle.com>
Cc: coresight@...ts.linaro.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
Suzuki K Poulose <suzuki.poulose@....com>, Mike Leach
<mike.leach@...aro.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>, Mark Rutland <mark.rutland@....com>,
Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Liang Kan <kan.liang@...ux.intel.com>
Subject: Re: [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU
capability
On 21/04/2025 10:58 pm, Yabin Cui wrote:
> For PMUs like ARM ETM/ETE, contiguous AUX buffers are unnecessary
> and increase memory fragmentation.
>
> This patch introduces PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES, allowing
> PMUs to request non-contiguous pages for their AUX buffers.
>
> Signed-off-by: Yabin Cui <yabinc@...gle.com>
> ---
> include/linux/perf_event.h | 1 +
> kernel/events/ring_buffer.c | 6 ++++++
> 2 files changed, 7 insertions(+)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 0069ba6866a4..26ca35d6a9f2 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -301,6 +301,7 @@ struct perf_event_pmu_context;
> #define PERF_PMU_CAP_AUX_OUTPUT 0x0080
> #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
> #define PERF_PMU_CAP_AUX_PAUSE 0x0200
> +#define PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES 0x0400
>
> /**
> * pmu::scope
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 5130b119d0ae..87f42f4e8edc 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -710,6 +710,12 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
> max_order = ilog2(nr_pages);
> watermark = 0;
> }
> + /*
> + * When the PMU doesn't prefer contiguous AUX buffer pages, favor
> + * low-order allocations to reduce memory fragmentation.
> + */
> + if (event->pmu->capabilities & PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES)
> + max_order = 0;
>
> /*
> * kcalloc_node() is unable to allocate buffer if the size is larger
Hi Yabin,
I was wondering if this is just the opposite of PERF_PMU_CAP_AUX_NO_SG,
and that order 0 should be used by default for all devices to solve the
issue you describe. Because we already have PERF_PMU_CAP_AUX_NO_SG for
devices that need contiguous pages. Then I found commit 5768402fd9c6
("perf/ring_buffer: Use high order allocations for AUX buffers
optimistically") that explains that the current allocation strategy is
an optimization.
Your change seems to decide that for certain devices we want to optimize
for fragmentation rather than performance. If these are rarely used
features specifically when looking at performance should we not continue
to optimize for performance? Or at least make it user configurable?
Thanks
James
Powered by blists - more mailing lists