[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <be13b2ce-a8c1-4aa7-9ddf-9ae8daee0ae1@linux.intel.com>
Date: Tue, 17 Jun 2025 16:32:24 -0400
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Mark Rutland <mark.rutland@....com>, Peter Zijlstra <peterz@...radead.org>
Cc: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>, mingo@...hat.com,
acme@...nel.org, namhyung@...nel.org, tglx@...utronix.de,
dave.hansen@...ux.intel.com, irogers@...gle.com, adrian.hunter@...el.com,
jolsa@...nel.org, alexander.shishkin@...ux.intel.com,
linux-kernel@...r.kernel.org, ak@...ux.intel.com, zide.chen@...el.com,
broonie@...nel.org
Subject: Re: [RFC PATCH 06/12] perf: Support extension of sample_regs
On 2025-06-17 10:55 a.m., Mark Rutland wrote:
> On Tue, Jun 17, 2025 at 04:44:16PM +0200, Peter Zijlstra wrote:
>> On Tue, Jun 17, 2025 at 03:24:01PM +0100, Mark Rutland wrote:
>>
>>> TBH, I don't think we can handle extended state in a generic way unless
>>> we treat this like a ptrace regset, and delegate the format of each
>>> specific register set to the architecture code.
>>>
>>> On arm64, the behaviour is modal (with two different vector lengths for
>>> streaming/non-streaming SVE when SME is implemented), per-task
>>> configurable (with different vector lengths), can differ between
>>> host/guest for KVM, and some of the registers only exist in some
>>> configurations (e.g. the FFR only exists for SME if FA64 is
>>> implemented).
>>
>> Well, much of this is per necessity architecture specific. But the
>> general form of vector registers is similar enough.
>>
>> The main point is to not try and cram the vector registers into multiple
>> GP regs (sadly that is exactly what x86 started doing).
>
> I see, sorry for the noise. I completely agree that we shouldn't cram
> this stuff into GP regs.
>
>> Anyway, your conditional length thing is 'fun' and has two solutions:
>>
>> - the arch can refuse to create per-cpu counters with SIMD samples, or
>>
>> - 0 pad all 'unobtainable state'.
>>
>> Same when asking for wider vectors than the hardware supports; eg.
>> asking for 512 wide registers on Intel clients will likely end up in a
>> lot of 0s for the high bits -- seeing how AVX512 is mostly a server
>> thing on Intel.
>
> Yep, those options may work for us, but we'd need to think harder about
> it. Our approach for ptrace and signals has been to have a header and
> pack at the active vector length, so padding to a max width would be
> different, but maybe it's fine.
>
> Having another representation feels like a recipe waiting to happen.
>
I'd like to make sure I understand correctly.
If we'd like an explicit predicate register word, the below change in
struct perf_event_attr is OK for ARM as well, right?
__u16 sample_simd_pred_reg_words;
__u16 sample_simd_pred_reg_intr;
__u16 sample_simd_pred_reg_user;
__u16 sample_simd_reg_words;
__u64 sample_simd_reg_intr;
__u64 sample_simd_reg_user;
BTW: would that be easier for ARM if changing the _words to _type?
You may define some types like, stream_sve, n_stream_sve, etc.
The output will depend on the types, rather than the max length of
registers.
Thanks,
Kan
Powered by blists - more mailing lists