lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <39117e6a-ebb6-4c92-a19c-2033c4e590cd@arm.com>
Date: Fri, 27 Jun 2025 18:57:31 +0100
From: Robin Murphy <robin.murphy@....com>
To: Arnd Bergmann <arnd@...nel.org>, Will Deacon <will@...nel.org>,
 Mark Rutland <mark.rutland@....com>, Nathan Chancellor <nathan@...nel.org>
Cc: Arnd Bergmann <arnd@...db.de>,
 Nick Desaulniers <nick.desaulniers+lkml@...il.com>,
 Bill Wendling <morbo@...gle.com>, Justin Stitt <justinstitt@...gle.com>,
 Ilkka Koskinen <ilkka@...amperecomputing.com>,
 linux-arm-kernel@...ts.infradead.org, linux-perf-users@...r.kernel.org,
 linux-kernel@...r.kernel.org, llvm@...ts.linux.dev
Subject: Re: [PATCH] perf/arm-cmn: reduce stack usage in arm_cmn_probe()

On 20/06/2025 12:51 pm, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@...db.de>
> 
> This function has a rather large stack usage, which triggers the
> warning limit with clang if I reduce the default to 1280 bytes:
> 
> drivers/perf/arm-cmn.c:2541:12: error: stack frame size (1312) exceeds limit (1280) in 'arm_cmn_probe' [-Werror,-Wframe-larger-than]
> 
> This is a combination of two problems:
> 
>   - The arm_cmn_discover() function has some large local variables and
>     gets inlined here by clang (but not gcc)
> 
>   - The (struct pmu) assignment adds an extra copy of the pmu structure
>     on the stack and does a memcpy() from that
> 
> Address the first one here by marking arm_cmn_discover() as noinline_for_stack,
> making clang behave more like gcc here. This gets it under the warning
> limit, though the total stack usage does not actually get reduced.

At that point, though, it seems like we may as well just disable the 
warning :/

Fortunately it's not actually that hard to improve matters here, so I've 
just sent that patch:

https://lore.kernel.org/r/e7dd41bf0f1b098e2e4b01ef91318a4b272abff8.1751046159.git.robin.murphy@arm.com/T/#u

> It would be nice to also change the way struct pmu is initialized, but I
> see that this is done consistently for all pmu drivers. Ideally the function
> pointers should be moved into a 'static const' structure per driver as this
> is done in most other subsystems.

Beware that perf_pmu_register() does some further dynamic assignment of 
callbacks based on what the driver provided, so it's not necessarily 
straightforward to change in struct pmu itself. However, FWIW I have 
recently been playing with some ideas for reducing the amount of PMU 
registration boilerplate, and indeed one of them is to have a 
driver-level static template passed to a registration helper, which 
would at least make it easy to avoid the full by-value copies everywhere.

Thanks,
Robin.

> 
> Signed-off-by: Arnd Bergmann <arnd@...db.de>
> ---
>   drivers/perf/arm-cmn.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index 031d45d0fe3d..430c89760391 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -2243,7 +2243,8 @@ static enum cmn_node_type arm_cmn_subtype(enum cmn_node_type type)
>   	}
>   }
>   
> -static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
> +static noinline_for_stack int arm_cmn_discover(struct arm_cmn *cmn,
> +					       unsigned int rgn_offset)
>   {
>   	void __iomem *cfg_region;
>   	struct arm_cmn_node cfg, *dn;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ