lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <60c18595-c6a8-4c39-98fe-0822755fbdb7@linux.intel.com>
Date: Tue, 17 Jun 2025 09:52:12 -0400
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, acme@...nel.org, namhyung@...nel.org,
 tglx@...utronix.de, dave.hansen@...ux.intel.com, irogers@...gle.com,
 adrian.hunter@...el.com, jolsa@...nel.org,
 alexander.shishkin@...ux.intel.com, linux-kernel@...r.kernel.org,
 dapeng1.mi@...ux.intel.com, ak@...ux.intel.com, zide.chen@...el.com
Subject: Re: [RFC PATCH 00/12] Support vector and more extended registers in
 perf



On 2025-06-17 4:24 a.m., Peter Zijlstra wrote:
> On Fri, Jun 13, 2025 at 06:49:31AM -0700, kan.liang@...ux.intel.com wrote:
>> From: Kan Liang <kan.liang@...ux.intel.com>
>>
>> Starting from the Intel Ice Lake, the XMM registers can be collected in
>> a PEBS record. More registers, e.g., YMM, ZMM, OPMASK, SPP and APX, will
>> be added in the upcoming Architecture PEBS as well. But it requires the
>> hardware support.
>>
>> The patch set provides a software solution to mitigate the hardware
>> requirement. It utilizes the XSAVES command to retrieve the requested
>> registers in the overflow handler. The feature isn't limited to the PEBS
>> event or specific platforms anymore.
>> The hardware solution (if available) is still preferred, since it has
>> low overhead (especially with the large PEBS) and is more accurate.
>>
>> In theory, the solution should work for all X86 platforms. But I only
>> have newer Inter platforms to test. The patch set only enable the
>> feature for Intel Ice Lake and later platforms.
>>
>> Open:
>> The new registers include YMM, ZMM, OPMASK, SSP, and APX.
>> The sample_regs_user/intr has run out. A new field in the
>> struct perf_event_attr is required for the registers.
>> There could be several options as below for the new field.
>>
>> - Follow a similar format to XSAVES. Introduce the below fields to store
>>   the bitmap of the registers.
>>   struct perf_event_attr {
>>         ...
>>         __u64   sample_ext_regs_intr[2];
>>         __u64   sample_ext_regs_user[2];
>>         ...
>>   }
>>   Includes YMMH (16 bits), APX (16 bits), OPMASK (8 bits),
>>            ZMMH0-15 (16 bits), H16ZMM (16 bits), SSP
>>   For example, if a user wants YMM8, the perf tool needs to set the
>>   corresponding bits of XMM8 and YMMH8, and reconstruct the result.
>>   The method is similar to the existing method for
>>   sample_regs_user/intr, and match the XSAVES format.
>>   The kernel doesn't need to do extra configuration and reconstruction.
>>   It's implemented in the patch set.
>>
>> - Similar to the above method. But the fields are the bitmap of the
>>   complete registers, E.g., YMM (16 bits), APX (16 bits),
>>   OPMASK (8 bits), ZMM (32 bits), SSP.
>>   The kernel needs to do extra configuration and reconstruction,
>>   which may brings extra overhead.
>>
>> - Combine the XMM, YMM, and ZMM. So all the registers can be put into
>>   one u64 field.
>>         ...
>>         union {
>>                 __u64 sample_ext_regs_intr;   //sample_ext_regs_user is simiar
>>                 struct {
>>                         __u32 vector_bitmap;
>>                         __u32 vector_type   : 3, //0b001 XMM 0b010 YMM 0b100 ZMM
>>                               apx_bitmap    : 16,
>>                               opmask_bitmap : 8,
>>                               ssp_bitmap    : 1,
>>                               reserved      : 4,
>>
>>                 };
>>         ...
>>   For example, if the YMM8-15 is required,
>>   vector_bitmap: 0x0000ff00
>>   vector_type: 0x2
>>   This method can save two __u64 in the struct perf_event_attr.
>>   But it's not straightforward since it mixes the type and bitmap.
>>   The kernel also needs to do extra configuration and reconstruction.
>>
>> Please let me know if there are more ideas.
> 
> https://lkml.kernel.org/r/20250416155327.GD17910@noisy.programming.kicks-ass.net
>

It's similar to the third method, but using the words to replace the
type. Also there are more space left in case we add more SIMDs in future.
I will implement it in the V2.
> comes to mind. Combine that with a rule that reclaims the XMM register
> space from perf_event_x86_regs when sample_simd_reg_words != 0, and then
> we can put APX and SPP there.

OK. So the sample_simd_reg_words actually has another meaning now. It's
used as a flag to tell whether utilizing the old format.

If so, I think it may be better to have a dedicate sample_simd_reg_flag
field.

For example,

#define SAMPLE_SIMD_FLAGS_FORMAT_LEGACY		0x0
#define SAMPLE_SIMD_FLAGS_FORMAT_WORDS		0x1

	__u8 sample_simd_reg_flags;
	__u8 sample_simd_reg_words;
	__u64 sample_simd_reg_intr;
	__u64 sample_simd_reg_user;

If (sample_simd_reg_flags != 0) reclaims the XMM space for APX and SPP.

Does it make sense?

Thanks,
Kan



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ