[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <39117e6a-ebb6-4c92-a19c-2033c4e590cd@arm.com>
Date: Fri, 27 Jun 2025 18:57:31 +0100
From: Robin Murphy <robin.murphy@....com>
To: Arnd Bergmann <arnd@...nel.org>, Will Deacon <will@...nel.org>,
Mark Rutland <mark.rutland@....com>, Nathan Chancellor <nathan@...nel.org>
Cc: Arnd Bergmann <arnd@...db.de>,
Nick Desaulniers <nick.desaulniers+lkml@...il.com>,
Bill Wendling <morbo@...gle.com>, Justin Stitt <justinstitt@...gle.com>,
Ilkka Koskinen <ilkka@...amperecomputing.com>,
linux-arm-kernel@...ts.infradead.org, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org, llvm@...ts.linux.dev
Subject: Re: [PATCH] perf/arm-cmn: reduce stack usage in arm_cmn_probe()
On 20/06/2025 12:51 pm, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@...db.de>
>
> This function has a rather large stack usage, which triggers the
> warning limit with clang if I reduce the default to 1280 bytes:
>
> drivers/perf/arm-cmn.c:2541:12: error: stack frame size (1312) exceeds limit (1280) in 'arm_cmn_probe' [-Werror,-Wframe-larger-than]
>
> This is a combination of two problems:
>
> - The arm_cmn_discover() function has some large local variables and
> gets inlined here by clang (but not gcc)
>
> - The (struct pmu) assignment adds an extra copy of the pmu structure
> on the stack and does a memcpy() from that
>
> Address the first one here by marking arm_cmn_discover() as noinline_for_stack,
> making clang behave more like gcc here. This gets it under the warning
> limit, though the total stack usage does not actually get reduced.
At that point, though, it seems like we may as well just disable the
warning :/
Fortunately it's not actually that hard to improve matters here, so I've
just sent that patch:
https://lore.kernel.org/r/e7dd41bf0f1b098e2e4b01ef91318a4b272abff8.1751046159.git.robin.murphy@arm.com/T/#u
> It would be nice to also change the way struct pmu is initialized, but I
> see that this is done consistently for all pmu drivers. Ideally the function
> pointers should be moved into a 'static const' structure per driver as this
> is done in most other subsystems.
Beware that perf_pmu_register() does some further dynamic assignment of
callbacks based on what the driver provided, so it's not necessarily
straightforward to change in struct pmu itself. However, FWIW I have
recently been playing with some ideas for reducing the amount of PMU
registration boilerplate, and indeed one of them is to have a
driver-level static template passed to a registration helper, which
would at least make it easy to avoid the full by-value copies everywhere.
Thanks,
Robin.
>
> Signed-off-by: Arnd Bergmann <arnd@...db.de>
> ---
> drivers/perf/arm-cmn.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index 031d45d0fe3d..430c89760391 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -2243,7 +2243,8 @@ static enum cmn_node_type arm_cmn_subtype(enum cmn_node_type type)
> }
> }
>
> -static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
> +static noinline_for_stack int arm_cmn_discover(struct arm_cmn *cmn,
> + unsigned int rgn_offset)
> {
> void __iomem *cfg_region;
> struct arm_cmn_node cfg, *dn;
Powered by blists - more mailing lists