lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48640298-effa-42d4-9137-a18a51637f03@linaro.org>
Date: Tue, 22 Apr 2025 11:21:28 +0100
From: James Clark <james.clark@...aro.org>
To: Yabin Cui <yabinc@...gle.com>
Cc: coresight@...ts.linaro.org, linux-arm-kernel@...ts.infradead.org,
 linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
 Suzuki K Poulose <suzuki.poulose@....com>, Mike Leach
 <mike.leach@...aro.org>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
 Arnaldo Carvalho de Melo <acme@...nel.org>,
 Namhyung Kim <namhyung@...nel.org>, Mark Rutland <mark.rutland@....com>,
 Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
 Adrian Hunter <adrian.hunter@...el.com>,
 Liang Kan <kan.liang@...ux.intel.com>
Subject: Re: [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU
 capability



On 21/04/2025 10:58 pm, Yabin Cui wrote:
> For PMUs like ARM ETM/ETE, contiguous AUX buffers are unnecessary
> and increase memory fragmentation.
> 
> This patch introduces PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES, allowing
> PMUs to request non-contiguous pages for their AUX buffers.
> 
> Signed-off-by: Yabin Cui <yabinc@...gle.com>
> ---
>   include/linux/perf_event.h  | 1 +
>   kernel/events/ring_buffer.c | 6 ++++++
>   2 files changed, 7 insertions(+)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 0069ba6866a4..26ca35d6a9f2 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -301,6 +301,7 @@ struct perf_event_pmu_context;
>   #define PERF_PMU_CAP_AUX_OUTPUT			0x0080
>   #define PERF_PMU_CAP_EXTENDED_HW_TYPE		0x0100
>   #define PERF_PMU_CAP_AUX_PAUSE			0x0200
> +#define PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES	0x0400
>   
>   /**
>    * pmu::scope
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 5130b119d0ae..87f42f4e8edc 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -710,6 +710,12 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
>   		max_order = ilog2(nr_pages);
>   		watermark = 0;
>   	}
> +	/*
> +	 * When the PMU doesn't prefer contiguous AUX buffer pages, favor
> +	 * low-order allocations to reduce memory fragmentation.
> +	 */
> +	if (event->pmu->capabilities & PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES)
> +		max_order = 0;
>   
>   	/*
>   	 * kcalloc_node() is unable to allocate buffer if the size is larger

Hi Yabin,

I was wondering if this is just the opposite of PERF_PMU_CAP_AUX_NO_SG, 
and that order 0 should be used by default for all devices to solve the 
issue you describe. Because we already have PERF_PMU_CAP_AUX_NO_SG for 
devices that need contiguous pages. Then I found commit 5768402fd9c6 
("perf/ring_buffer: Use high order allocations for AUX buffers 
optimistically") that explains that the current allocation strategy is 
an optimization.

Your change seems to decide that for certain devices we want to optimize 
for fragmentation rather than performance. If these are rarely used 
features specifically when looking at performance should we not continue 
to optimize for performance? Or at least make it user configurable?

Thanks
James


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ