[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47cbf0f3-9480-497a-ad88-5eb0c50dd88a@linux.intel.com>
Date: Wed, 20 Aug 2025 17:46:46 +0800
From: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>
To: "Liang, Kan" <kan.liang@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, acme@...nel.org, namhyung@...nel.org,
tglx@...utronix.de, dave.hansen@...ux.intel.com, irogers@...gle.com,
adrian.hunter@...el.com, jolsa@...nel.org,
alexander.shishkin@...ux.intel.com, linux-kernel@...r.kernel.org,
ak@...ux.intel.com, zide.chen@...el.com, mark.rutland@....com,
broonie@...nel.org, ravi.bangoria@....com, eranian@...gle.com
Subject: Re: [PATCH V3 05/17] perf/x86: Support XMM register for non-PEBS and
REGS_USER
On 8/19/2025 11:55 PM, Liang, Kan wrote:
>
> On 2025-08-19 6:39 a.m., Peter Zijlstra wrote:
>> On Fri, Aug 15, 2025 at 02:34:23PM -0700, kan.liang@...ux.intel.com wrote:
>>> From: Kan Liang <kan.liang@...ux.intel.com>
>>>
>>> Collecting the XMM registers in a PEBS record has been supported since
>>> the Icelake. But non-PEBS events don't support the feature. It's
>>> possible to retrieve the XMM registers from the XSAVE for non-PEBS.
>>> Add it to make the feature complete.
>>>
>>> To utilize the XSAVE, a 64-byte aligned buffer is required. Add a
>>> per-CPU ext_regs_buf to store the vector registers. The size of the
>>> buffer is ~2K. kzalloc_node() is used because there's a _guarantee_
>>> that all kmalloc()'s with powers of 2 are naturally aligned and also
>>> 64b aligned.
>>>
>>> Extend the support for both REGS_USER and REGS_INTR. For REGS_USER, the
>>> perf_get_regs_user() returns the regs from the task_pt_regs(current),
>>> which is struct pt_regs. Need to move it to local struct x86_perf_regs
>>> x86_user_regs.
>>> For PEBS, the HW support is still preferred. The XMM should be retrieved
>>> from PEBS records.
>>>
>>> There could be more vector registers supported later. Add ext_regs_mask
>>> to track the supported vector register group.
>>
>> I'm a little confused... *again* :-)
>>
>> Specifically, we should consider two sets of registers:
>>
>> - the live set, as per the CPU (XSAVE)
>> - the stored set, as per x86_task_fpu()
>>
>> regs_intr should always get a copy of the live set; however
>> regs_user should not. It might need a copy of the x86_task_fpu() instead
>> of the live set, depending on TIF_NEED_FPU_LOAD (more or less, we need
>> another variable set in kernel_fpu_begin_mask() *after*
>> save_fpregs_to_fpstate() is completed).
>>
>> I don't see this code make this distinction.
>>
>> Consider getting a sample while the kernel is doing some avx enhanced
>> crypto and such.
> The regs_user only needs a set when the NMI hits the user mode
> (user_mode(regs)) or a non-kernel thread (!(current->flags &
> PF_KTHREAD)). The live set is good enough for both cases.
It's fine if NMI hits user mode, but if NMI hits the kernel mode
(!(current->flags &PF_KTHREAD)), won't the kernel space SIMD/eGPR regs be
exposed to user space for user-regs option? I'm not sure if kernel space
really use these SIMD/eGPR regs right now, but it seems a risk.
>
> I think the kernel crypto should be to a kernel thread (current->flags &
> PF_KTHREAD). If so, the regs_user should return NULL.
>
> Thanks,
> Kan
>
Powered by blists - more mailing lists