linux-kernel - [kvm-unit-tests patch 4/5] x86/pmu: Handle instruction overcount issue in overflow test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250712174915.196103-5-dapeng1.mi@linux.intel.com>
Date: Sat, 12 Jul 2025 17:49:14 +0000
From: Dapeng Mi <dapeng1.mi@...ux.intel.com>
To: Sean Christopherson <seanjc@...gle.com>,
	Paolo Bonzini <pbonzini@...hat.com>
Cc: kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Jim Mattson <jmattson@...gle.com>,
	Mingwei Zhang <mizhang@...gle.com>,
	Zide Chen <zide.chen@...el.com>,
	Das Sandipan <Sandipan.Das@....com>,
	Shukla Manali <Manali.Shukla@....com>,
	Yi Lai <yi1.lai@...el.com>,
	Dapeng Mi <dapeng1.mi@...el.com>,
	dongsheng <dongsheng.x.zhang@...el.com>,
	Dapeng Mi <dapeng1.mi@...ux.intel.com>
Subject: [kvm-unit-tests patch 4/5] x86/pmu: Handle instruction overcount issue in overflow test

From: dongsheng <dongsheng.x.zhang@...el.com>

During the execution of __measure(), VM exits (e.g., due to
WRMSR/EXTERNAL_INTERRUPT) may occur. On systems affected by the
instruction overcount issue, each VM-Exit/VM-Entry can erroneously
increment the instruction count by one, leading to false failures in
overflow tests.

To address this, the patch introduces a range-based validation in place
of precise instruction count checks. Additionally, overflow_preset is
now statically set to 1 - LOOP_INSNS, rather than being dynamically
determined via measure_for_overflow().

These changes ensure consistent and predictable behavior aligned with the
intended loop instruction count, while avoiding modifications to the
subsequent status and status-clear testing logic.

The chosen validation range is empirically derived to maintain test
reliability across hardware variations.

Signed-off-by: dongsheng <dongsheng.x.zhang@...el.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@...ux.intel.com>
Tested-by: Yi Lai <yi1.lai@...el.com>
---
 x86/pmu.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index 44c728a5..c54c0988 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -518,6 +518,21 @@ static void check_counters_many(void)
 
 static uint64_t measure_for_overflow(pmu_counter_t *cnt)
 {
+	/*
+	 * During the execution of __measure(), VM exits (e.g., due to
+	 * WRMSR/EXTERNAL_INTERRUPT) may occur. On systems affected by the
+	 * instruction overcount issue, each VM-Exit/VM-Entry can erroneously
+	 * increment the instruction count by one, leading to false failures
+	 * in overflow tests.
+	 *
+	 * To mitigate this, if the overcount issue is detected, we hardcode
+	 * the overflow preset to (1 - LOOP_INSNS) instead of calculating it
+	 * dynamically. This ensures that an overflow will reliably occur,
+	 * regardless of any overcounting caused by VM exits.
+	 */
+	if (intel_inst_overcount_flags & INST_RETIRED_OVERCOUNT)
+		return 1 - LOOP_INSNS;
+
 	__measure(cnt, 0);
 	/*
 	 * To generate overflow, i.e. roll over to '0', the initial count just
@@ -574,8 +589,12 @@ static void check_counter_overflow(void)
 			cnt.config &= ~EVNTSEL_INT;
 		idx = event_to_global_idx(&cnt);
 		__measure(&cnt, cnt.count);
-		if (pmu.is_intel)
-			report(cnt.count == 1, "cntr-%d", i);
+		if (pmu.is_intel) {
+			if (intel_inst_overcount_flags & INST_RETIRED_OVERCOUNT)
+				report(cnt.count < 14, "cntr-%d", i);
+			else
+				report(cnt.count == 1, "cntr-%d", i);
+		}
 		else
 			report(cnt.count == 0xffffffffffff || cnt.count < 7, "cntr-%d", i);
 
-- 
2.43.0