lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <Zts41l46Ufo0tk4Q@finisterre.sirena.org.uk>
Date: Fri, 6 Sep 2024 18:16:06 +0100
From: Mark Brown <broonie@...nel.org>
To: Catalin Marinas <catalin.marinas@....com>
Cc: Will Deacon <will@...nel.org>, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, Mark Rutland <mark.rutland@....com>
Subject: Re: [PATCH] arm64/fpsimd: Ensure we don't contend a SMCU from idling
CPUs
On Fri, Sep 06, 2024 at 03:56:52PM +0100, Catalin Marinas wrote:
> On Thu, Sep 05, 2024 at 07:34:41PM +0100, Mark Brown wrote:
> > On context switch the SMSTOP is issued as part of loading the state for
> > the task but we only do that when either returning to userspace or it's
> > a kernel thread with active FPSIMD usage. The idle thread is a kernel
> > thread with no FPSIMD usage so we don't touch the state. If we did the
> > SMSTOP unconditionally that'd mean that the optimisation where we don't
> > reload the FP state if we bounce through a kernel thread would be broken
> > while using SME which doesn't seem ideal, idling really does seem like a
> > meaningfully special case here.
> It depends on why the CPU is idling and we don't have the whole
> information in this function. If it was a wait on a syscall, we already
> discarded the state (but we only issue sme_smstop_sm() IIUC). With this
> patch, we'd disable the ZA storage as well, can it cause any performance
> issues by forcing the user to re-fault?
There will be some overhead from reloading the FP state, yes.
> If it's some short-lived wait for I/O on page faults, we may not want to
> disable streaming mode. I don't see this last case much different from
> switching to a kernel thread that doesn't use SME.
Yeah, that one is going to depend a lot on how performant the I/O is.
> So I think this leaves us with the case where a thread is migrated to a
> different CPU and the current CPU goes into idle for longer. But, again,
> we can't tell in the arch callback. The cpuidle driver calling into
> firmware is slightly better informed since it knows it's been idle (or
> going to be) for longer.
Yes, cpuidle is a whole different case - this is mainly targeted at the
case where that's been disabled in the kernel configuration (I was
considering making this conditional on !CPUIDLE, it was an oversight not
to do that in the first place).
> > > Also this looks hypothetical until we have some hardware to test it on,
> > > see how it would behave with a shared SME unit.
> > The specific performance impacts will depend on hardware (there'll
> > likely be some power impact even on things with a single FP unit per
> > PE) but given that keeping SM and ZA disabled when not in use is a
> > fairly strong recommendation in the programming model my inclination at
> > this point would be to program to the advertised model until we have
> > confirmation that the hardware actually behaves otherwise.
> Does the programming model talk about shared units (I haven't read it,
> not even sure where it is)? I hope one CPU cannot DoS another by not
> issuing SMSTOPs and the hardware has some provisions for sharing that
> guarantees forward progress on all CPUs. They may not be optimal but
> it's highly depended on the software usage and hardware behaviour.
This is all getting totally into IMPDEF behaviour (and QoI issues) but
implementations are supposed to default to something which shares things
equally between all the users and guarantees forward progress. Anything
that doesn't guarantee forward progress would obviously be quite
specialist, and you'd hope that if the PE isn't actually issuing FP
instructions it won't impact anything. Even if things do great you'll
still have the cost of keeping the unit on though.
> I'm inclined not to do anything at this stage until we see the actual
> hardware behaviour in practice.
Like I say my inclination is the opposite way round, though probably
with a check for !CONFIG_CPUIDLE.
Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)
Powered by blists - more mailing lists