linux-kernel - Re: [PATCH v2 1/2] perf: arm_spe: Correct setting the PERF_HES

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20260114175240.GA1286628@e132581.arm.com>
Date: Wed, 14 Jan 2026 17:52:40 +0000
From: Leo Yan <leo.yan@....com>
To: Will Deacon <will@...nel.org>
Cc: Mark Rutland <mark.rutland@....com>,
	Alexandru Elisei <alexandru.elisei@....com>,
	James Clark <james.clark@...aro.org>,
	linux-arm-kernel@...ts.infradead.org,
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/2] perf: arm_spe: Correct setting the
 PERF_HES_STOPPED flag

On Thu, Jan 08, 2026 at 04:23:58PM +0000, Will Deacon wrote:

[...]

> > > How is it not for this flow? You're talking about:
> > > 
> > > arm_spe_pmu_start
> > > 	=> arm_spe_perf_aux_output_begin
> > > 		=> arm_spe_pmu_next_off // Returns error
> > > 
> > > The only way arm_spe_pmu_next_off() returns an error is if
> > > __arm_spe_pmu_next_off() fails, and that's the flow I'm talking about.

[...]

> > The issue is a mismatch between the state machine and the hardware
> > state.  When arm_spe_perf_aux_output_begin() detects an error and does
> > not set PMBLIMITR_EL1_E, the trace unit is effectively stopped, but
> > the state machine is not updated to PERF_HES_STOPPED. This causes
> > callers to handle errors incorrectly [1][2].
> > 
> > It is arguable that the disable IRQ work will eventually disable the
> > trace unit and update hw.state, but the state should be updated in the
> > first place by the PMU driver to notify even core layer.
> 
> From what I can tell, perf_aux_output_end() will call
> perf_event_disable_inatomic() which should end up invoking
> perf_pending_disable() via an IPI-to-self to disable the event and put
> it in the PERF_HES_STOPPED state before we return to userspace.
> 
> So I still struggle to see the problem here.

The issue is that the SPE driver does not properly propagate errors when
arm_spe_pmu_next_off() fails.  Instead, it behaves as if tracing was
enabled successfully, which leads to redundant operations and an
inconsistent state in the perf core.

Let us dig a bit.

  arm_spe_pmu_start()
  {
      hwc->state = 0;

      /* Fails inside arm_spe_pmu_next_off() */
      arm_spe_perf_aux_output_begin(handle, event);

      /* hwc->state remains 0, so execution continues */
      if (hwc->state)
          return;

      reg = arm_spe_event_to_pmsfcr(event);
      write_sysreg_s(reg, SYS_PMSFCR_EL1);
      ...
  }

In arm_spe_pmu_start(), a failure in arm_spe_perf_aux_output_begin()
does not set PERF_HES_STOPPED, so hwc->state remains zero and the
function continues to program filters even though has failed.

Moveover, the driver still returns success to the perf core.  As a
result, event_sched_in() assumes the event was started correctly and
proceeds to enable other events.

  event_sched_in()
  {
      ...

      if (event->pmu->add(event, PERF_EF_START)) {
        perf_event_set_state(event, PERF_EVENT_STATE_INACTIVE);
        event->oncpu = -1;
        ret = -EAGAIN;
        goto out;
      }

      ...
  }

This breaks event group case, for example:

  perf record -e '{cs_etm//,cycles}' -- test

The perf core expects all events in a group to start and stop together,
but the SPE driver's incorrect reporting causes misalignment.

Sorry for late reply.

Thanks,
Leo