[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFGBxBVFLnkmg3CP@J2N7QTR9R3>
Date: Tue, 17 Jun 2025 15:55:00 +0100
From: Mark Rutland <mark.rutland@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>, kan.liang@...ux.intel.com,
mingo@...hat.com, acme@...nel.org, namhyung@...nel.org,
tglx@...utronix.de, dave.hansen@...ux.intel.com, irogers@...gle.com,
adrian.hunter@...el.com, jolsa@...nel.org,
alexander.shishkin@...ux.intel.com, linux-kernel@...r.kernel.org,
ak@...ux.intel.com, zide.chen@...el.com, broonie@...nel.org
Subject: Re: [RFC PATCH 06/12] perf: Support extension of sample_regs
On Tue, Jun 17, 2025 at 04:44:16PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 17, 2025 at 03:24:01PM +0100, Mark Rutland wrote:
>
> > TBH, I don't think we can handle extended state in a generic way unless
> > we treat this like a ptrace regset, and delegate the format of each
> > specific register set to the architecture code.
> >
> > On arm64, the behaviour is modal (with two different vector lengths for
> > streaming/non-streaming SVE when SME is implemented), per-task
> > configurable (with different vector lengths), can differ between
> > host/guest for KVM, and some of the registers only exist in some
> > configurations (e.g. the FFR only exists for SME if FA64 is
> > implemented).
>
> Well, much of this is per necessity architecture specific. But the
> general form of vector registers is similar enough.
>
> The main point is to not try and cram the vector registers into multiple
> GP regs (sadly that is exactly what x86 started doing).
I see, sorry for the noise. I completely agree that we shouldn't cram
this stuff into GP regs.
> Anyway, your conditional length thing is 'fun' and has two solutions:
>
> - the arch can refuse to create per-cpu counters with SIMD samples, or
>
> - 0 pad all 'unobtainable state'.
>
> Same when asking for wider vectors than the hardware supports; eg.
> asking for 512 wide registers on Intel clients will likely end up in a
> lot of 0s for the high bits -- seeing how AVX512 is mostly a server
> thing on Intel.
Yep, those options may work for us, but we'd need to think harder about
it. Our approach for ptrace and signals has been to have a header and
pack at the active vector length, so padding to a max width would be
different, but maybe it's fine.
Having another representation feels like a recipe waiting to happen.
Mark.
Powered by blists - more mailing lists