[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260114175240.GA1286628@e132581.arm.com>
Date: Wed, 14 Jan 2026 17:52:40 +0000
From: Leo Yan <leo.yan@....com>
To: Will Deacon <will@...nel.org>
Cc: Mark Rutland <mark.rutland@....com>,
Alexandru Elisei <alexandru.elisei@....com>,
James Clark <james.clark@...aro.org>,
linux-arm-kernel@...ts.infradead.org,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/2] perf: arm_spe: Correct setting the
PERF_HES_STOPPED flag
On Thu, Jan 08, 2026 at 04:23:58PM +0000, Will Deacon wrote:
[...]
> > > How is it not for this flow? You're talking about:
> > >
> > > arm_spe_pmu_start
> > > => arm_spe_perf_aux_output_begin
> > > => arm_spe_pmu_next_off // Returns error
> > >
> > > The only way arm_spe_pmu_next_off() returns an error is if
> > > __arm_spe_pmu_next_off() fails, and that's the flow I'm talking about.
[...]
> > The issue is a mismatch between the state machine and the hardware
> > state. When arm_spe_perf_aux_output_begin() detects an error and does
> > not set PMBLIMITR_EL1_E, the trace unit is effectively stopped, but
> > the state machine is not updated to PERF_HES_STOPPED. This causes
> > callers to handle errors incorrectly [1][2].
> >
> > It is arguable that the disable IRQ work will eventually disable the
> > trace unit and update hw.state, but the state should be updated in the
> > first place by the PMU driver to notify even core layer.
>
> From what I can tell, perf_aux_output_end() will call
> perf_event_disable_inatomic() which should end up invoking
> perf_pending_disable() via an IPI-to-self to disable the event and put
> it in the PERF_HES_STOPPED state before we return to userspace.
>
> So I still struggle to see the problem here.
The issue is that the SPE driver does not properly propagate errors when
arm_spe_pmu_next_off() fails. Instead, it behaves as if tracing was
enabled successfully, which leads to redundant operations and an
inconsistent state in the perf core.
Let us dig a bit.
arm_spe_pmu_start()
{
hwc->state = 0;
/* Fails inside arm_spe_pmu_next_off() */
arm_spe_perf_aux_output_begin(handle, event);
/* hwc->state remains 0, so execution continues */
if (hwc->state)
return;
reg = arm_spe_event_to_pmsfcr(event);
write_sysreg_s(reg, SYS_PMSFCR_EL1);
...
}
In arm_spe_pmu_start(), a failure in arm_spe_perf_aux_output_begin()
does not set PERF_HES_STOPPED, so hwc->state remains zero and the
function continues to program filters even though has failed.
Moveover, the driver still returns success to the perf core. As a
result, event_sched_in() assumes the event was started correctly and
proceeds to enable other events.
event_sched_in()
{
...
if (event->pmu->add(event, PERF_EF_START)) {
perf_event_set_state(event, PERF_EVENT_STATE_INACTIVE);
event->oncpu = -1;
ret = -EAGAIN;
goto out;
}
...
}
This breaks event group case, for example:
perf record -e '{cs_etm//,cycles}' -- test
The perf core expects all events in a group to start and stop together,
but the SPE driver's incorrect reporting causes misalignment.
Sorry for late reply.
Thanks,
Leo
Powered by blists - more mailing lists