lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zo8ZDBisWJonBVqF@finisterre.sirena.org.uk>
Date: Thu, 11 Jul 2024 00:28:12 +0100
From: Mark Brown <broonie@...nel.org>
To: Catalin Marinas <catalin.marinas@....com>
Cc: Will Deacon <will@...nel.org>, linux-arm-kernel@...ts.infradead.org,
	linux-kernel@...r.kernel.org, Mark Rutland <mark.rutland@....com>
Subject: Re: [PATCH] arm64/fpsimd: Ensure we don't contend a SMCU from idling
 CPUs

On Wed, Jul 10, 2024 at 09:10:53PM +0100, Catalin Marinas wrote:
> On Tue, Jun 18, 2024 at 02:57:42PM +0100, Mark Brown wrote:

> > When we enter the kernel we currently don't update any of the floating
> > point register state until either something else uses floating point or we
> > get a CPU_PM_ENTER notification during suspend or cpuidle. This means that
> > for a system which has been configured with both suspend and cpuidle
> > disabled we will leave whatever floating point state was loaded in the
> > registers present while a CPU is idling.

> I guess this approach is useful when the kernel does a light WFI rather
> than going all the way to firmware for a deep sleep state. In general,
> the shallower the sleep state is, the more CPU state is retained. From a
> power perspective, I wonder whether we should leave the decision to drop
> the SME state to a cpuidle driver.

The concern here is if we don't have a cpuidle driver - we could also
make this conditional on !CPUIDLE.

> Which situations should we consider for such idle scenario (we discussed
> them offline briefly)? I think:

> 1. Thread migration: a thread using SME is moved to a different CPU.
>    Here SMSTOP makes sense because a new thread scheduled eventually
>    will need a different SME state.

> 2. Thread page fault followed by waiting for I/O etc. and the kernel may
>    switch to idle. Here it's probably less beneficial to do an SMSTOP.

> Any other cases? Blocking syscalls don't matter since we don't preserve
> the state between calls.

For syscalls we explicitly disable streaming mode, but we do allow ZA to
be active so you might have a blocking syscall with ZA enabled.  Having
state in ZA is less of a concern than streaming mode, it will have a
power impact but it is much less likely that there will be a performance
impact on other cores.

> The trade-off is for case (2) above and it depends on whether it happens
> sufficiently often to be noticeable. I wouldn't think so.

Yes, to me it seems much more likely that we would decide to schedule a
task out while it was using SME rather than getting faults where the
overhead of reloading the state was noticable.

> > +	/*
> > +	 * Leaving SME enabled may leave this core contending with
> > +	 * other cores if we have a SMCU, disable whenever we enter
> > +	 * idle to avoid this.  Only do this if they're actually
> > +	 * enabled to avoid overhead in cases where we don't enter a
> > +	 * low enough power state to loose register state.
> > +	 */
> > +	if (system_supports_sme() &&
> > +	    (read_sysreg_s(SYS_SVCR) & (SVCR_SM_MASK | SVCR_ZA_MASK)))
> > +		fpsimd_save_and_flush_cpu_state();
> > +}

> Do we always enter here via the idle thread? If we already had a thread
> switch we probably don't need to save the SME state again, only flush
> the state.

If we've actually switched the thread then TIF_FOREIGN_FPSTATE has been
set and we'll just check the flag and return for the save portion rather
than actually writing any register values out so the overhead should be
minimal.  It feels safer to check in case we get better at doing the
save lazily.

Otherwise arch_cpu_idle_enter() is called from psci_checker as well as
the idle thread, this code should not be relevant either way in that
context since it runs before userspace and AIUI it's trying to do the
same thing as the idle thread there anyway.

Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ