[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aW-wfy3u5OD3BQAQ@willie-the-truck>
Date: Tue, 20 Jan 2026 16:42:39 +0000
From: Will Deacon <will@...nel.org>
To: Leo Yan <leo.yan@....com>
Cc: Mark Rutland <mark.rutland@....com>,
Alexandru Elisei <alexandru.elisei@....com>,
James Clark <james.clark@...aro.org>,
linux-arm-kernel@...ts.infradead.org,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/2] perf: arm_spe: Correct setting the
PERF_HES_STOPPED flag
On Wed, Jan 14, 2026 at 05:52:40PM +0000, Leo Yan wrote:
> > > The issue is a mismatch between the state machine and the hardware
> > > state. When arm_spe_perf_aux_output_begin() detects an error and does
> > > not set PMBLIMITR_EL1_E, the trace unit is effectively stopped, but
> > > the state machine is not updated to PERF_HES_STOPPED. This causes
> > > callers to handle errors incorrectly [1][2].
> > >
> > > It is arguable that the disable IRQ work will eventually disable the
> > > trace unit and update hw.state, but the state should be updated in the
> > > first place by the PMU driver to notify even core layer.
> >
> > From what I can tell, perf_aux_output_end() will call
> > perf_event_disable_inatomic() which should end up invoking
> > perf_pending_disable() via an IPI-to-self to disable the event and put
> > it in the PERF_HES_STOPPED state before we return to userspace.
> >
> > So I still struggle to see the problem here.
>
> The issue is that the SPE driver does not properly propagate errors when
> arm_spe_pmu_next_off() fails. Instead, it behaves as if tracing was
> enabled successfully, which leads to redundant operations and an
> inconsistent state in the perf core.
>
> Let us dig a bit.
>
> arm_spe_pmu_start()
> {
> hwc->state = 0;
>
> /* Fails inside arm_spe_pmu_next_off() */
> arm_spe_perf_aux_output_begin(handle, event);
>
> /* hwc->state remains 0, so execution continues */
> if (hwc->state)
> return;
>
> reg = arm_spe_event_to_pmsfcr(event);
> write_sysreg_s(reg, SYS_PMSFCR_EL1);
> ...
> }
>
> In arm_spe_pmu_start(), a failure in arm_spe_perf_aux_output_begin()
> does not set PERF_HES_STOPPED, so hwc->state remains zero and the
> function continues to program filters even though has failed.
>
> Moveover, the driver still returns success to the perf core. As a
> result, event_sched_in() assumes the event was started correctly and
> proceeds to enable other events.
>
> event_sched_in()
> {
> ...
>
> if (event->pmu->add(event, PERF_EF_START)) {
> perf_event_set_state(event, PERF_EVENT_STATE_INACTIVE);
> event->oncpu = -1;
> ret = -EAGAIN;
> goto out;
> }
>
> ...
> }
>
> This breaks event group case, for example:
>
> perf record -e '{cs_etm//,cycles}' -- test
>
> The perf core expects all events in a group to start and stop together,
> but the SPE driver's incorrect reporting causes misalignment.
Ok, so looking at this and the next patch I wonder if we could simplify
things a little by having arm_spe_perf_aux_output_begin() return an 'int'
to indicate success/failure instead of touching 'hwc->state'.
Then arm_spe_pmu_start() and the interrupt handler could call into
arm_spe_pmu_stop() if they get an error code back. Would that work?
Will
Powered by blists - more mailing lists