[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <368e7626-c9bd-47be-bb42-f542dc3d67b7@intel.com>
Date: Fri, 13 Jun 2025 08:15:02 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: kan.liang@...ux.intel.com, peterz@...radead.org, mingo@...hat.com,
acme@...nel.org, namhyung@...nel.org, tglx@...utronix.de,
dave.hansen@...ux.intel.com, irogers@...gle.com, adrian.hunter@...el.com,
jolsa@...nel.org, alexander.shishkin@...ux.intel.com,
linux-kernel@...r.kernel.org
Cc: dapeng1.mi@...ux.intel.com, ak@...ux.intel.com, zide.chen@...el.com
Subject: Re: [RFC PATCH 05/12] perf/x86: Support XMM register for non-PEBS and
REGS_USER
> +static DEFINE_PER_CPU(void *, ext_regs_buf);
This should probably use one of the types in asm/fpu/types.h, not void*.
> +static void x86_pmu_get_ext_regs(struct x86_perf_regs *perf_regs, u64 mask)
> +{
> + void *xsave = (void *)ALIGN((unsigned long)per_cpu(ext_regs_buf, smp_processor_id()), 64);
I'd just align the allocation to avoid having to align it at runtime
like this.
> + struct xregs_state *xregs_xsave = xsave;
> + u64 xcomp_bv;
> +
> + if (WARN_ON_ONCE(!xsave))
> + return;
> +
> + xsaves_nmi(xsave, mask);
> +
> + xcomp_bv = xregs_xsave->header.xcomp_bv;
> + if (mask & XFEATURE_MASK_SSE && xcomp_bv & XFEATURE_SSE)
> + perf_regs->xmm_regs = (u64 *)xregs_xsave->i387.xmm_space;
> +}
Could we please align the types on:
perf_regs->xmm_regs
and
xregs_xsave->i387.xmm_space
so that no casting is required?
> +static void reserve_ext_regs_buffers(void)
> +{
> + size_t size;
> + int cpu;
> +
> + if (!x86_pmu.ext_regs_mask)
> + return;
> +
> + size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
> +
> + /* XSAVE feature requires 64-byte alignment. */
> + size += 64;
Does this actually work? ;)
Take a look at your system when it boots. You should see some helpful
pr_info()'s:
> [ 0.137276] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
> [ 0.138799] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
> [ 0.139681] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
> [ 0.140576] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
> [ 0.141569] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
> [ 0.142804] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
> [ 0.143665] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
> [ 0.144436] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
> [ 0.145290] x86/fpu: xstate_offset[5]: 832, xstate_sizes[5]: 64
> [ 0.146238] x86/fpu: xstate_offset[6]: 896, xstate_sizes[6]: 512
> [ 0.146803] x86/fpu: xstate_offset[7]: 1408, xstate_sizes[7]: 1024
> [ 0.147397] x86/fpu: xstate_offset[9]: 2432, xstate_sizes[9]: 8
> [ 0.147986] x86/fpu: Enabled xstate features 0x2e7, context size is 2440 bytes, using 'compacted' format.
Notice that we're talking about a buffer which is ~2k in size when
AVX-512 is in play. Is 'size' above that big?
> + for_each_possible_cpu(cpu) {
> + per_cpu(ext_regs_buf, cpu) = kzalloc_node(size, GFP_KERNEL,
> + cpu_to_node(cpu));
> + if (!per_cpu(ext_regs_buf, cpu))
> + goto err;
> + }
Right now, any kmalloc() >=256b is going to be rounded up and aligned to
a power of 2 and thus also be 64b aligned although this is just an
implementation detail today. There's a _guarantee_ that all kmalloc()'s
with powers of 2 are naturally aligned and also 64b aligned.
In other words, in practice, these kzalloc_node() are 64b aligned and
rounded up to a power of 2 size.
You can *guarantee* they'll be 64b aligned by just rounding size up to
the next power of 2. This won't increase the size because they're
already being rounded up internally.
I can also grumble a little bit because this reinvents the wheel, and I
suspect it'll continue reinventing the wheel when it actually sizes the
buffer correctly.
We already have code in the kernel to dynamically allocate an fpstate:
fpstate_realloc(). It uses vmalloc() which wouldn't be my first choice
for this, but I also don't think it will hurt much. Looking at it, I'm
not sure how much of it you want to refactor and reuse, but you should
at least take a look.
There's also xstate_calculate_size(). That, you _definitely_ want to use
if you end up doing your own allocations.
Powered by blists - more mailing lists