[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <mhng-D7782B90-52E0-44DB-89A8-70079A8751E7@palmerdabbelt-mac>
Date: Thu, 07 Aug 2025 12:17:38 -0700 (PDT)
From: Palmer Dabbelt <palmer@...belt.com>
To: Marc Zyngier <maz@...nel.org>
CC: Catalin Marinas <catalin.marinas@....com>, Mark Rutland <mark.rutland@....com>,
Will Deacon <will@...nel.org>, oliver.upton@...ux.dev, james.morse@....com, cohuck@...hat.com,
anshuman.khandual@....com, palmerdabbelt@...a.com, lpieralisi@...nel.org, kevin.brodsky@....com,
scott@...amperecomputing.com, broonie@...nel.org, james.clark@...aro.org, yeoreum.yun@....com,
joey.gouly@....com, huangxiaojia2@...wei.com, yebin10@...wei.com,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] arm64: Expose CPUECTLR{,2}_EL1 via sysfs
On Thu, 07 Aug 2025 11:06:27 PDT (-0700), Marc Zyngier wrote:
> On Thu, 07 Aug 2025 18:26:29 +0100,
> Palmer Dabbelt <palmer@...belt.com> wrote:
>>
>> On Thu, 07 Aug 2025 01:08:26 PDT (-0700), Marc Zyngier wrote:
>> > On Wed, 06 Aug 2025 20:48:13 +0100,
>> > Palmer Dabbelt <palmer@...belt.com> wrote:
>> >>
>> >> From: Palmer Dabbelt <palmerdabbelt@...a.com>
>> >>
>> >> We've found that some of our workloads run faster when some of these
>> >> fields are set to non-default values on some of the systems we're trying
>> >> to run those workloads on. This allows us to set those values via
>> >> sysfs, so we can do workload/system-specific tuning.
>> >>
>> >> Signed-off-by: Palmer Dabbelt <palmerdabbelt@...a.com>
>> >> ---
>> >> I've only really smoke tested this, but I figured I'd send it along because I'm
>> >> not sure if this is even a sane thing to be doing -- these extended control
>> >> registers have some wacky stuff in them, so maybe they're not exposed to
>> >> userspace on purpose. IIUC firmware can gate these writes, though, so it
>> >> should be possible for vendors to forbid the really scary values.
>> >
>> > That's really wrong.
>> >
>> > For a start, these encodings fall into the IMPDEF range. They won't
>> > exist on non-ARM implementations.
>>
>> OK, and that's because it says "Provides additional IMPLEMENTATION
>> DEFINED configuration and control options for the processor." at the
>> start of the manual page? Sorry, I'm kind of new to trying to read
>> the Arm specs -- I thought just the meaning of the values was
>> changing, but I probably just didn't read enough specs.
>
> The architecture defines a range described in D24.2.162 (in the L.b
> revision of the ARM ARM) which is reserved for IMPDEF purposes.
>
> What these registers do is not defined, and there is no standard
> across implementations. This really is for chicken bits and other fun
> stuff. Most of them will simply generate an UNDEF, because they don't
> pass the decode stage. But for all we know, there is a bit in there
> that turns NOP into the HCF instruction -- or better.
>
> So exposing any of that stuff for any given CPU is dangerous. And
> exposing any of it on *all* CPUs is a bit like swallowing a powered
> chainsaw (don't).
OK, makes sense.
>> > Next, this will catch fire as a guest, either because the hypervisor
>> > will simply refuse to entertain letting it access registers that have
>> > no definition, or because the VM has been migrated from one
>> > implementation to another, and you have no idea this is doing on the
>> > new target.
>>
>> Ya, makes sense.
>>
>> >> That said, we do see some performance improvements here on real workloads. So
>> >> we're hoping to roll some of this tuning work out more widely, but we also
>> >> don't want to adopt some internal interface. Thus it'd make our lives easier
>> >> if we could twiddle these bits in a standard way.
>> >
>> > Honestly, this is the sort of bring-up stuff that is better kept in
>> > your private sandbox, and doesn't really help in general.
>>
>> So we're not doing bringup (or at least not doing what I'd call
>> bringup) here, the theory is that we just get better performance on
>> different workloads with different tunings. That's all still a little
>> early, but if the data holds we'd want to be setting these based on
>> what workload is running (ie, not just some static tuning for
>> everything).
>
> In general, none of that crap is safe to turn on and off at random
> times. You really want to talk to your implementer to find out. And if
> it is, the firmware is probably the place to handle that.
Ya, if it's generally not expected to be sane to runtime modify these
then it seems sane to just hide them behind a firmare interface. Then
it's really up to the firmware to proactively expose the bits that are
useful, and it's inherently vendor-specific.
>> That said, part of the reason I just sent this as-is is because I was
>> sort of expecting the answer to be "no" here. No big deal if that's
>> the case, we can figure out some other way to solve the problem.
>> Happy to throw some time in to making some more generic flavor of
>> this, though...
>
> I have no idea how we can achieve that, given that there is no
> architected definition for any of these registers.
I'd basically have some interface for getting/setting the registers that
the kernel exposes (gated behind whatever tests we'd need to make sure
the registers are accessible), and then some userspace program that deal
with the implementation-specific behavior. It'd probably just devolve
into some database of known implementations with what the bits do, with
some attempt at mapping them to generic behavior -- though even that's
kind of clunky, as something like "this tunes some prefetcher to smell
different" doesn't really help a ton.
If it's a firmware-gated thing, though, then it's probably going to just
end up as some vendor-specific firmware widget that we go fumble around
in ACPI to mangle...
>
> M.
Powered by blists - more mailing lists