[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0782de41-c8c4-4077-8498-651fb9a10ef5@linux.intel.com>
Date: Wed, 18 Jun 2025 06:10:20 -0400
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Mark Rutland <mark.rutland@....com>,
"Mi, Dapeng" <dapeng1.mi@...ux.intel.com>, mingo@...hat.com,
acme@...nel.org, namhyung@...nel.org, tglx@...utronix.de,
dave.hansen@...ux.intel.com, irogers@...gle.com, adrian.hunter@...el.com,
jolsa@...nel.org, alexander.shishkin@...ux.intel.com,
linux-kernel@...r.kernel.org, ak@...ux.intel.com, zide.chen@...el.com,
broonie@...nel.org
Subject: Re: [RFC PATCH 06/12] perf: Support extension of sample_regs
On 2025-06-18 5:35 a.m., Peter Zijlstra wrote:
> On Tue, Jun 17, 2025 at 04:32:24PM -0400, Liang, Kan wrote:
>
>>> Yep, those options may work for us, but we'd need to think harder about
>>> it. Our approach for ptrace and signals has been to have a header and
>>> pack at the active vector length, so padding to a max width would be
>>> different, but maybe it's fine.
>>>
>>> Having another representation feels like a recipe waiting to happen.
>>>
>>
>> I'd like to make sure I understand correctly.
>> If we'd like an explicit predicate register word, the below change in
>> struct perf_event_attr is OK for ARM as well, right?
>>
>> __u16 sample_simd_pred_reg_words;
>> __u16 sample_simd_pred_reg_intr;
>> __u16 sample_simd_pred_reg_user;
>> __u16 sample_simd_reg_words;
>> __u64 sample_simd_reg_intr;
>> __u64 sample_simd_reg_user;
>>
>> BTW: would that be easier for ARM if changing the _words to _type?
>> You may define some types like, stream_sve, n_stream_sve, etc.
>> The output will depend on the types, rather than the max length of
>> registers.
>
> I'm thinking what they're after is something like:
>
> PERF_SAMPLE_SIMD_REGS := {
> u16 nr_vectors;
> u16 vector_length;
> u16 nr_pred;
> u16 pred_length;
> u64 data[];
> }
Maybe we should use a mask to replace the nr_vectors.
Because Dave mentioned that the XSAVES may fail.
Currently, perf gives all 0 for the failing case. But 0 should also be a
valid output.
The mask can tell the tool that some regs are failed to be collected. So
the tool can give proper feedback to the end user.
PERF_SAMPLE_SIMD_REGS := {
u64 vectors_mask;
u16 vector_length;
u64 pred_mask;
u16 pred_length;
u64 data[];
}
Thanks,
Kan>
> Where the output data also has a length. Such that even if we ask for
> 512 bit vectors, the thing is allowed to respond with say 128 bit
> vectors if that is all the machine has at that time.
>
Powered by blists - more mailing lists