[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250617144416.GY1613376@noisy.programming.kicks-ass.net>
Date: Tue, 17 Jun 2025 16:44:16 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Mark Rutland <mark.rutland@....com>
Cc: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>, kan.liang@...ux.intel.com,
mingo@...hat.com, acme@...nel.org, namhyung@...nel.org,
tglx@...utronix.de, dave.hansen@...ux.intel.com, irogers@...gle.com,
adrian.hunter@...el.com, jolsa@...nel.org,
alexander.shishkin@...ux.intel.com, linux-kernel@...r.kernel.org,
ak@...ux.intel.com, zide.chen@...el.com
Subject: Re: [RFC PATCH 06/12] perf: Support extension of sample_regs
On Tue, Jun 17, 2025 at 03:24:01PM +0100, Mark Rutland wrote:
> TBH, I don't think we can handle extended state in a generic way unless
> we treat this like a ptrace regset, and delegate the format of each
> specific register set to the architecture code.
>
> On arm64, the behaviour is modal (with two different vector lengths for
> streaming/non-streaming SVE when SME is implemented), per-task
> configurable (with different vector lengths), can differ between
> host/guest for KVM, and some of the registers only exist in some
> configurations (e.g. the FFR only exists for SME if FA64 is
> implemented).
Well, much of this is per necessity architecture specific. But the
general form of vector registers is similar enough.
The main point is to not try and cram the vector registers into multiple
GP regs (sadly that is exactly what x86 started doing).
Anyway, your conditional length thing is 'fun' and has two solutions:
- the arch can refuse to create per-cpu counters with SIMD samples, or
- 0 pad all 'unobtainable state'.
Same when asking for wider vectors than the hardware supports; eg.
asking for 512 wide registers on Intel clients will likely end up in a
lot of 0s for the high bits -- seeing how AVX512 is mostly a server
thing on Intel.
Powered by blists - more mailing lists