lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 20 Nov 2023 22:32:10 +0000
From:   "Ashley, William" <wash@...zon.com>
To:     "Ashley, William" <wash@...zon.com>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:     "linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>
Subject: Re: armv8pmu: Pending overflow interrupt is discarded when perf event
 is disabled

Adding linux-arm-kernel@...ts.infradead.org and linux-kernel@...r.kernel.org,
sorry for the noise.


On 11/20/23, 12:36 PM, "Ashley, William" <wash@...zon.com <mailto:wash@...zon.com>> wrote:


An issue [1] was opened in the rr-debugger project reporting occasional missed
perf event overflow signals on arm64. I've been digging into this and think I
understand what's happening, but wanted to confirm my understanding.


The attached example application, derived from an rr-debugger test case, reports
when the value of a counter doesn't increase by the expected period +/- some
tolerance. When it is ping-ponged between cores (e.g. with taskset) at a high
frequency, it frequently reports increases of ~2x the expected. I've confirmed
this same behavior on kernels 5.4, 5.10, 6.1 and 6.5.


I found armv8pmu_disable_intens [2] that is called as part of event
de-scheduling and contains
/* Clear the overflow flag in case an interrupt is pending. */
write_pmovsclr(mask);
which results in any pending overflow interrupt being dropped. I added some
debug output here and indeed there is a correlation of this bit being high at
the point of the above code and the reproducer identifying a missed signal.


This behavior does not occur with pseudo-NMIs (irqchip.gicv3_pseudo_nmi=1)
enabled.


When an event is not being explicitly torn down (e.g. being closed), this seems
like an undesirable behavior. I haven't attempted to demo it yet, but I suspect
an application disabling an event temporarily could occasionally see the same
missed overflow signals. Is my understanding here correct? Does anyone have
thoughts on how this could be addressed without creating other issues?


[1] https://github.com/rr-debugger/rr/issues/3607 <https://github.com/rr-debugger/rr/issues/3607>
[2] https://github.com/torvalds/linux/blob/c42d9eeef8e5ba9292eda36fd8e3c11f35ee065c/drivers/perf/arm_pmuv3.c#L652C20-L652C43 <https://github.com/torvalds/linux/blob/c42d9eeef8e5ba9292eda36fd8e3c11f35ee065c/drivers/perf/arm_pmuv3.c#L652C20-L652C43>







Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ