lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d095993b-01e4-4a7a-bb50-363db92df007@linux.intel.com>
Date: Fri, 13 Jun 2025 13:51:33 -0400
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Dave Hansen <dave.hansen@...el.com>, peterz@...radead.org,
 mingo@...hat.com, acme@...nel.org, namhyung@...nel.org, tglx@...utronix.de,
 dave.hansen@...ux.intel.com, irogers@...gle.com, adrian.hunter@...el.com,
 jolsa@...nel.org, alexander.shishkin@...ux.intel.com,
 linux-kernel@...r.kernel.org
Cc: dapeng1.mi@...ux.intel.com, ak@...ux.intel.com, zide.chen@...el.com
Subject: Re: [RFC PATCH 05/12] perf/x86: Support XMM register for non-PEBS and
 REGS_USER



On 2025-06-13 11:15 a.m., Dave Hansen wrote:
>> +static DEFINE_PER_CPU(void *, ext_regs_buf);
> 
> This should probably use one of the types in asm/fpu/types.h, not void*.
> 
>> +static void x86_pmu_get_ext_regs(struct x86_perf_regs *perf_regs, u64 mask)
>> +{
>> +	void *xsave = (void *)ALIGN((unsigned long)per_cpu(ext_regs_buf, smp_processor_id()), 64);
> 
> I'd just align the allocation to avoid having to align it at runtime
> like this.
> 
>> +	struct xregs_state *xregs_xsave = xsave;
>> +	u64 xcomp_bv;
>> +
>> +	if (WARN_ON_ONCE(!xsave))
>> +		return;
>> +
>> +	xsaves_nmi(xsave, mask);
>> +
>> +	xcomp_bv = xregs_xsave->header.xcomp_bv;
>> +	if (mask & XFEATURE_MASK_SSE && xcomp_bv & XFEATURE_SSE)
>> +		perf_regs->xmm_regs = (u64 *)xregs_xsave->i387.xmm_space;
>> +}
> 
> Could we please align the types on:
> 
> 	perf_regs->xmm_regs
> and
> 	xregs_xsave->i387.xmm_space
> 
> so that no casting is required?
> 
>> +static void reserve_ext_regs_buffers(void)
>> +{
>> +	size_t size;
>> +	int cpu;
>> +
>> +	if (!x86_pmu.ext_regs_mask)
>> +		return;
>> +
>> +	size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
>> +
>> +	/* XSAVE feature requires 64-byte alignment. */
>> +	size += 64;
> 
> Does this actually work? ;)
> 
> Take a look at your system when it boots. You should see some helpful
> pr_info()'s:
> 
>> [    0.137276] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
>> [    0.138799] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
>> [    0.139681] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
>> [    0.140576] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
>> [    0.141569] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
>> [    0.142804] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
>> [    0.143665] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
>> [    0.144436] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
>> [    0.145290] x86/fpu: xstate_offset[5]:  832, xstate_sizes[5]:   64
>> [    0.146238] x86/fpu: xstate_offset[6]:  896, xstate_sizes[6]:  512
>> [    0.146803] x86/fpu: xstate_offset[7]: 1408, xstate_sizes[7]: 1024
>> [    0.147397] x86/fpu: xstate_offset[9]: 2432, xstate_sizes[9]:    8
>> [    0.147986] x86/fpu: Enabled xstate features 0x2e7, context size is 2440 bytes, using 'compacted' format.
> 
> Notice that we're talking about a buffer which is ~2k in size when
> AVX-512 is in play. Is 'size' above that big?
> 
>> +	for_each_possible_cpu(cpu) {
>> +		per_cpu(ext_regs_buf, cpu) = kzalloc_node(size, GFP_KERNEL,
>> +							  cpu_to_node(cpu));
>> +		if (!per_cpu(ext_regs_buf, cpu))
>> +			goto err;
>> +	}
> 
> Right now, any kmalloc() >=256b is going to be rounded up and aligned to
> a power of 2 and thus also be 64b aligned although this is just an
> implementation detail today. There's a _guarantee_ that all kmalloc()'s
> with powers of 2 are naturally aligned and also 64b aligned.
> 
> In other words, in practice, these kzalloc_node() are 64b aligned and
> rounded up to a power of 2 size.
> 
> You can *guarantee* they'll be 64b aligned by just rounding size up to
> the next power of 2. This won't increase the size because they're
> already being rounded up internally.
> 
> I can also grumble a little bit because this reinvents the wheel, and I
> suspect it'll continue reinventing the wheel when it actually sizes the
> buffer correctly.
> 
> We already have code in the kernel to dynamically allocate an fpstate:
> fpstate_realloc(). It uses vmalloc() which wouldn't be my first choice
> for this, but I also don't think it will hurt much. Looking at it, I'm
> not sure how much of it you want to refactor and reuse, but you should
> at least take a look.
>
> There's also xstate_calculate_size(). That, you _definitely_ want to use
> if you end up doing your own allocations.
> 
The fpstate_realloc() seems too complicate for this simple usage.
I will use the xstate_calculate_size() to get the size of each
component. The size is the real size after 64-byte alignment for each
component. Then the vmalloc() is used to allocate the buffer. I think it
should be good enough for this usage.

Thanks,
Kan


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ