[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <EBAF38AB-2BE5-425F-8A52-DDCB0B390309@amazon.com>
Date: Mon, 20 Nov 2023 22:32:10 +0000
From: "Ashley, William" <wash@...zon.com>
To: "Ashley, William" <wash@...zon.com>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: "linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>
Subject: Re: armv8pmu: Pending overflow interrupt is discarded when perf event
is disabled
Adding linux-arm-kernel@...ts.infradead.org and linux-kernel@...r.kernel.org,
sorry for the noise.
On 11/20/23, 12:36 PM, "Ashley, William" <wash@...zon.com <mailto:wash@...zon.com>> wrote:
An issue [1] was opened in the rr-debugger project reporting occasional missed
perf event overflow signals on arm64. I've been digging into this and think I
understand what's happening, but wanted to confirm my understanding.
The attached example application, derived from an rr-debugger test case, reports
when the value of a counter doesn't increase by the expected period +/- some
tolerance. When it is ping-ponged between cores (e.g. with taskset) at a high
frequency, it frequently reports increases of ~2x the expected. I've confirmed
this same behavior on kernels 5.4, 5.10, 6.1 and 6.5.
I found armv8pmu_disable_intens [2] that is called as part of event
de-scheduling and contains
/* Clear the overflow flag in case an interrupt is pending. */
write_pmovsclr(mask);
which results in any pending overflow interrupt being dropped. I added some
debug output here and indeed there is a correlation of this bit being high at
the point of the above code and the reproducer identifying a missed signal.
This behavior does not occur with pseudo-NMIs (irqchip.gicv3_pseudo_nmi=1)
enabled.
When an event is not being explicitly torn down (e.g. being closed), this seems
like an undesirable behavior. I haven't attempted to demo it yet, but I suspect
an application disabling an event temporarily could occasionally see the same
missed overflow signals. Is my understanding here correct? Does anyone have
thoughts on how this could be addressed without creating other issues?
[1] https://github.com/rr-debugger/rr/issues/3607 <https://github.com/rr-debugger/rr/issues/3607>
[2] https://github.com/torvalds/linux/blob/c42d9eeef8e5ba9292eda36fd8e3c11f35ee065c/drivers/perf/arm_pmuv3.c#L652C20-L652C43 <https://github.com/torvalds/linux/blob/c42d9eeef8e5ba9292eda36fd8e3c11f35ee065c/drivers/perf/arm_pmuv3.c#L652C20-L652C43>
Powered by blists - more mailing lists