lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250617102813.GS1613376@noisy.programming.kicks-ass.net>
Date: Tue, 17 Jun 2025 12:28:13 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>
Cc: kan.liang@...ux.intel.com, mingo@...hat.com, acme@...nel.org,
	namhyung@...nel.org, tglx@...utronix.de,
	dave.hansen@...ux.intel.com, irogers@...gle.com,
	adrian.hunter@...el.com, jolsa@...nel.org,
	alexander.shishkin@...ux.intel.com, linux-kernel@...r.kernel.org,
	ak@...ux.intel.com, zide.chen@...el.com
Subject: Re: [RFC PATCH 06/12] perf: Support extension of sample_regs

On Tue, Jun 17, 2025 at 05:49:13PM +0800, Mi, Dapeng wrote:
> 
> On 6/17/2025 4:14 PM, Peter Zijlstra wrote:
> > On Fri, Jun 13, 2025 at 06:49:37AM -0700, kan.liang@...ux.intel.com wrote:
> >> From: Kan Liang <kan.liang@...ux.intel.com>
> >>
> >> More regs may be required in a sample, e.g., the vector registers. The
> >> current sample_regs_XXX has run out of space.
> >>
> >> Add sample_ext_regs_intr/user[2] in the struct perf_event_attr. It's used
> >> as a bitmap for the extension regs. There will be more than 64 registers
> >> added.
> >> Add a new flag PERF_PMU_CAP_EXTENDED_REGS2 to indicate the PMU which
> >> supports sample_ext_regs_intr/user.
> >>
> >> Extend the perf_reg_validate() to support the validation of the
> >> extension regs.
> >>
> >> Extend the perf_reg_value() to retrieve the extension regs. The regs may
> >> be larger than u64. Add two parameters to store the pointer and size.
> >> Add a dedicated perf_output_sample_ext_regs() to dump the extension
> >> regs.
> >>
> >> This is just a generic support for the extension regs. Any attempts to
> >> manipulate the extension regs will error out, until the driver-specific
> >> supports are implemented, which will be done in the following patch.
> >>
> >> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> >> index 78a362b80027..e22ba72efcdb 100644
> >> --- a/include/uapi/linux/perf_event.h
> >> +++ b/include/uapi/linux/perf_event.h
> >> @@ -382,6 +382,10 @@ enum perf_event_read_format {
> >>  #define PERF_ATTR_SIZE_VER6			120	/* Add: aux_sample_size */
> >>  #define PERF_ATTR_SIZE_VER7			128	/* Add: sig_data */
> >>  #define PERF_ATTR_SIZE_VER8			136	/* Add: config3 */
> >> +#define PERF_ATTR_SIZE_VER9			168	/* Add: sample_ext_regs_intr */
> >> +							/* Add: sample_ext_regs_user */
> >> +
> >> +#define PERF_ATTR_EXT_REGS_SIZE			2
> >>  
> >>  /*
> >>   * 'struct perf_event_attr' contains various attributes that define
> >> @@ -543,6 +547,10 @@ struct perf_event_attr {
> >>  	__u64	sig_data;
> >>  
> >>  	__u64	config3; /* extension of config2 */
> >> +
> >> +	/* extension of sample_regs_XXX */
> >> +	__u64	sample_ext_regs_intr[PERF_ATTR_EXT_REGS_SIZE];
> >> +	__u64	sample_ext_regs_user[PERF_ATTR_EXT_REGS_SIZE];
> >>  };
> > Did anybody read this email?
> >
> >   https://lkml.kernel.org/r/20250416155327.GD17910@noisy.programming.kicks-ass.net
> >
> > The current regs interface really was designed for regular registers,
> > and trying to squish SIMD registers into it is a trainwreck.
> >
> > Not to mention that AAAARGHH64 and Risc-V have vector widths up to 2048
> > bit.
> 
> Yes, we followed this discussion. In sample_ext_regs_intr/user[], each bit
> represents an extended register regardless of how much bits does the
> register have.  At the beginning we added a item "sample_simd_reg_words" to
> represent the bit-width of the corresponding extended register, but we
> found it's unnecessary since the regs bitmap is fixed for a specific arch

So I disagree. Fundamentally we have only 32 SIMD registers on x86. We
should not have more bits than that.

> and the arch-specific code would know how many bits for the certain regs,
> e.g., on x86 platform, the bit 0 would represent YMMH0 in this patchset ,
> then the x86 specific perf code would know its bit-wdith is 128bits.

This is nonsense; YMMH0 is not a register. It should never be in this
array.

> The reason that we define an array with 2 u64 words is that we plan to
> support YMM (16 bits) + APX (16 bits) + OPMASK (8 bits) + ZMM (32 bits) +
> SSP (1 bit) regs which needs 73 bits and one u64 word is not enough.

This is insane. So now you're having 16 XMM 'regs', 16 YMMH 'regs' and
32 ZMMH 'regs' for a total of 64 bits that should have been just 32.

Suppose we're going to be doing AVX-1024, because awesome. That means we
need another 32 bits to denote whatever letter comes after 'Z'.

So no, this idiocy stops now.

We're going to do a sane SIMD register set with variable width, and
reclaim the XMM regs from the normal set.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ