linux-kernel - Re: [RFC PATCH 06/12] perf: Support extension of sample

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aFGBxBVFLnkmg3CP@J2N7QTR9R3>
Date: Tue, 17 Jun 2025 15:55:00 +0100
From: Mark Rutland <mark.rutland@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>, kan.liang@...ux.intel.com,
	mingo@...hat.com, acme@...nel.org, namhyung@...nel.org,
	tglx@...utronix.de, dave.hansen@...ux.intel.com, irogers@...gle.com,
	adrian.hunter@...el.com, jolsa@...nel.org,
	alexander.shishkin@...ux.intel.com, linux-kernel@...r.kernel.org,
	ak@...ux.intel.com, zide.chen@...el.com, broonie@...nel.org
Subject: Re: [RFC PATCH 06/12] perf: Support extension of sample_regs

On Tue, Jun 17, 2025 at 04:44:16PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 17, 2025 at 03:24:01PM +0100, Mark Rutland wrote:
> 
> > TBH, I don't think we can handle extended state in a generic way unless
> > we treat this like a ptrace regset, and delegate the format of each
> > specific register set to the architecture code.
> > 
> > On arm64, the behaviour is modal (with two different vector lengths for
> > streaming/non-streaming SVE when SME is implemented), per-task
> > configurable (with different vector lengths), can differ between
> > host/guest for KVM, and some of the registers only exist in some
> > configurations (e.g. the FFR only exists for SME if FA64 is
> > implemented).
> 
> Well, much of this is per necessity architecture specific. But the
> general form of vector registers is similar enough.
> 
> The main point is to not try and cram the vector registers into multiple
> GP regs (sadly that is exactly what x86 started doing).

I see, sorry for the noise. I completely agree that we shouldn't cram
this stuff into GP regs.

> Anyway, your conditional length thing is 'fun' and has two solutions:
> 
>   - the arch can refuse to create per-cpu counters with SIMD samples, or
> 
>   - 0 pad all 'unobtainable state'.
> 
> Same when asking for wider vectors than the hardware supports; eg.
> asking for 512 wide registers on Intel clients will likely end up in a
> lot of 0s for the high bits -- seeing how AVX512 is mostly a server
> thing on Intel.

Yep, those options may work for us, but we'd need to think harder about
it. Our approach for ptrace and signals has been to have a header and
pack at the active vector length, so padding to a max width would be
different, but maybe it's fine.

Having another representation feels like a recipe waiting to happen.

Mark.