lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c34db006-4d5b-fb71-f998-63fcdcde6c0b@arm.com>
Date:   Wed, 30 Nov 2022 18:16:19 +0000
From:   Robin Murphy <robin.murphy@....com>
To:     Geoff Blake <blakgeof@...zon.com>
Cc:     will@...nel.org, mark.rutland@....com,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] perf/arm-cmn: Cope with spurious IRQs better

On 2022-11-30 16:02, Geoff Blake wrote:
> Robin,
> 
>  From my perspective, this is a worse solution as now we're sweeping an
> issue under the rug and consuming CPU cycles handling IRQs we should not
> be getting in the first place.  While an overflow IRQ from the cmn should
> not be high frequency, there is a non-zero chance in the future it could
> be and this could lead to a very hard to debug performance issue instead
> of the current problem, which is discovering we need to clean up better
> from a noisy kernel message.

Kexec is not the only possible source of spurious IRQs. If they cause a 
problem for this driver, that cannot be robustly addressed by trying to 
rely on whatever software might happen to run before this driver.

> The driver as best I can grok currently is optimized to limit the amount
> of register writes for the common use-case, which is setting and unsetting
> events, so all the wiring for the PMU to feed events to the DTC is done up
> front on load: DTC_CTL's DT_EN bit is set immediately during probe, as is
> OVFL_INTR_EN. All the DN states and DTM PMU_CONFIG_PMU_EN is deferred
> for when an event is actually set, and here we go through all of them
> anyways for each event unless its bynodeid, so the expense of setting
> events grows linearly with the mesh size anyways.

If arm_cmn_init_dtc() writing 0 to PMCR didn't stop the PMU then we've 
got bigger problems, because that's how we expect to start and stop it 
in normal operation. I'm not ruling out that some subtle bug in that 
regard might exist, since I've still not yet had a chance to reproduce 
and observe this behaviour on my board, but I've also not seen 
sufficient evidence to suggest that that is the case either. (Now that 
I'm looking closely, I think there *is* actually a small oversight for 
the DTMs, but that would lead to different symptoms than you reported)

At least the writes to PMOVSR_CLR *did* clearly work, because you're 
seeing the "nobody cared" message from the IRQ core rather than the 
WARN_ON(!dtc->counters[i]) which would happen if a fresh overflow was 
actually asserted. Currently I would expect to see up to 4 of those 
messages since there can be up to 4 IRQs, but once those are all 
requested, enabled, and "handled", all the spurious initially-latched 
state should be cleared and any *new* overflows will be indicated in 
PMOVSR. I don't see how a single IRQ could ever be unhandled more than 
once anyway, if the first time disables it.

Thanks,
Robin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ