linux-kernel - Re: [PATCH 0/3] perf: arm_spe: Add support for SPE VM interface

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e5193801-9160-4e05-8d02-eb129487437c@arm.com>
Date: Mon, 4 Aug 2025 22:49:14 +0100
From: Suzuki K Poulose <suzuki.poulose@....com>
To: James Clark <james.clark@...aro.org>,
 Alexandru Elisei <alexandru.elisei@....com>
Cc: Will Deacon <will@...nel.org>, Mark Rutland <mark.rutland@....com>,
 Catalin Marinas <catalin.marinas@....com>,
 Anshuman Khandual <Anshuman.Khandual@....com>,
 Rob Herring <Rob.Herring@....com>, Robin Murphy <Robin.Murphy@....com>,
 linux-arm-kernel@...ts.infradead.org, linux-perf-users@...r.kernel.org,
 linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
 Arnaldo Carvalho de Melo <acme@...nel.org>,
 Namhyung Kim <namhyung@...nel.org>
Subject: Re: [PATCH 0/3] perf: arm_spe: Add support for SPE VM interface

On 04/08/2025 17:00, James Clark wrote:
> 
> 
> On 01/08/2025 2:28 pm, Alexandru Elisei wrote:
>> Hi,
>>
>> On Tue, Jul 01, 2025 at 04:31:56PM +0100, James Clark wrote:
>>> SPE can be used from within a guest as long as the driver adheres to the
>>> new VM interface spec [1]. Because the driver should behave correctly
>>> whether it's running in a guest or not, the first patches are marked as
>>> a fix. Furthermore, in future versions of the architecture the PE will
>>> be allowed to behave in the same way.
>>>
>>> The last patch adds new behavior to make it easier for guests to be
>>> able to reserve large buffers. It's not strictly necessary, so it's not
>>> marked as a fix.
>>
>> I had a look at the patches, and they all look ok to me, so for the 
>> series:
>>
>> Reviewed-by: Alexandru Elisei <alexandru.elisei@....com>
>>
>> I also tested the series by hacking SPE virtualization support in KVM:
>>
>> - without these changes, the SPE driver gets into an infinite loop 
>> because it
>>    clears PMBSR_EL1.S before clearing PMBLIMITR_EL.E, and the 
>> hypervisor is
>>    allowed to ignore the write to PMBSR_EL1.
>>
>> - with these changes, that doesn't happen.
>>
>> - ran perf for about a day in a loop in a virtual machine and didn't 
>> notice
>>    anything out of the ordinary.
>>
>> - ran perf for about a day in a loop on baremetal and similary 
>> everything looked
>>    alright.
>>
>> - checked that the SPE driver correctly decodes the maximum buffer 
>> size for
>>    sizes 4M, 2M (2M is right at the boundary between the two encoding 
>> schemes)
>>    and 1M; that's also correctly reflected in
>>    /sys/devices/platform/<spe>/arm_spe_0/caps/max_buffer_size.
>>
>> - checked that perf is not allowed to use a buffer larger than the 
>> maximum.
>>
>> - checked that the SPE driver correctly detects a buffer size 
>> management event.
>>
>> So:
>>
>> Tested-by: Alexandru Elisei <alexandru.elisei@....com>
>>
>> While testing I noticed two things:
>>
>> 1. When perf tries to use a buffer larger than the maximum, the error 
>> is EINVAL
>> (22):
>>
>> # cat /sys/devices/platform/spe/arm_spe_0/caps/max_buff_size
>> 4194304
>> # perf record -ae arm_spe// -m,16M -- sleep 10
>> failed to mmap with 22 (Invalid argument)
>>
>> (used 16M as the buffer size because what the driver ends up 
>> programming is half
>> that).
>>
>> I would have expected to get back ENOMEM (12), that seems less 
>> ambiguous to me.
>> I had to hack the driver to print an error message to dmesg when the 
>> max buffer
>> size is exceed to make sure that's why I was seeing the error message 
>> in perf,
>> and it wasn't because of something else. I get that that's 
>> because .setup_aux()
>> can only return NULL on error, but feels like there's room for 
>> improvement here.
>>
> 
> We could add an error code, rb_alloc_aux() already returns one and that 
> calls setup_aux(). But the scenarios would be either EINVAL or ENOMEM 
> and wouldn't give the user the exact reason ("need > 2 pages", "need 
> even number of pages", etc). So I'm not sure it would be enough of an 
> improvement over returning NULL to be worth it.
> 
> However I will add a warning into Perf if the user asks for more than 
> caps/max_buffer_size. That would be a useful message and Perf can do it 
> itself so it doesn't need to be in the driver changes.
> 
>> 2. A hypervisor is allowed to inject a buffer size event even though 
>> the buffer
>> set by the guest is smaller than the maximum advertised. For example, 
>> this can
>> happen if there isn't enough memory to pin the buffer, or if the limit 
>> on pinned
>> memory is exceeded in the hypervisor (implementation specific 
>> behaviour, not
>> mandated in DEN0154, of course).
>>
>> In this situation, when the SPE driver gets a buffer size management 
>> event
>> injected by the hypervisor, there is no way for the driver to 
>> communicate it to
>> the perf instance, and the profiled process continues executing even 
>> though
>> profiling has stopped.
>>
>> That's not different from what happens today with buffer management 
>> events, but
>> unlike the other events, which aren't under the control of userspace, 
>> the buffer
>> size event is potentially recoverable if userspace restarts perf with 
>> a smaller
>> buffer.
>>
>> Do you think there's something that can be done to improve this 
>> situation?
>>
>> Thanks,
>> Alex
>>
> 
> It doesn't look like there's currently anything that can stop an event 
> or signal to Perf that the event has gone bad.
> 
> We could add something like "__u32 error" to struct 
> perf_event_mmap_page. But I'm not sure what you'd do with it. If Perf is 
> the parent of the process you wouldn't want to kill it in case anything 
> bad happens. So you're left with leaving it running anyway. If it's just 
> an error message that you want then there's already one in dmesg for 
> buffer management errors, and that string is a lot better than a single 
> code. Unless these new codes were detailed PMU specific ones? Actually 
> it's a whole page so why not make it a string...
> 
> It's not a case of the samples ending randomly somewhere though, you'll 
> either get all of them or none of them. So it will be quite obvious to 
> the user that something has gone wrong. Secondly I think the scenario of 
> not being able to pin memory when asking for less than the limit would 
> be very rare. It's probably fine to leave it like this for now and we 
> can always add something later, maybe if people start to run into it for 
> real.

Do we get EMPTY AUX records in this case where there was no profiling
data ? If so, why not convey the error via AUX record header flags ?

Suzuk

> 
> James
> 
>