lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250617144416.GY1613376@noisy.programming.kicks-ass.net>
Date: Tue, 17 Jun 2025 16:44:16 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Mark Rutland <mark.rutland@....com>
Cc: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>, kan.liang@...ux.intel.com,
	mingo@...hat.com, acme@...nel.org, namhyung@...nel.org,
	tglx@...utronix.de, dave.hansen@...ux.intel.com, irogers@...gle.com,
	adrian.hunter@...el.com, jolsa@...nel.org,
	alexander.shishkin@...ux.intel.com, linux-kernel@...r.kernel.org,
	ak@...ux.intel.com, zide.chen@...el.com
Subject: Re: [RFC PATCH 06/12] perf: Support extension of sample_regs

On Tue, Jun 17, 2025 at 03:24:01PM +0100, Mark Rutland wrote:

> TBH, I don't think we can handle extended state in a generic way unless
> we treat this like a ptrace regset, and delegate the format of each
> specific register set to the architecture code.
> 
> On arm64, the behaviour is modal (with two different vector lengths for
> streaming/non-streaming SVE when SME is implemented), per-task
> configurable (with different vector lengths), can differ between
> host/guest for KVM, and some of the registers only exist in some
> configurations (e.g. the FFR only exists for SME if FA64 is
> implemented).

Well, much of this is per necessity architecture specific. But the
general form of vector registers is similar enough.

The main point is to not try and cram the vector registers into multiple
GP regs (sadly that is exactly what x86 started doing).

Anyway, your conditional length thing is 'fun' and has two solutions:

  - the arch can refuse to create per-cpu counters with SIMD samples, or

  - 0 pad all 'unobtainable state'.

Same when asking for wider vectors than the hardware supports; eg.
asking for 512 wide registers on Intel clients will likely end up in a
lot of 0s for the high bits -- seeing how AVX512 is mostly a server
thing on Intel.



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ