lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 12 Jul 2022 19:08:24 -0400
From:   "Liang, Kan" <kan.liang@...ux.intel.com>
To:     Vince Weaver <vincent.weaver@...ne.edu>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>
Cc:     linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>
Subject: Re: [perf] unchecked MSR access error: WRMSR to 0x689 in
 intel_pmu_lbr_restore



On 2022-07-12 5:26 p.m., Vince Weaver wrote:
> On Tue, 12 Jul 2022, Pawan Gupta wrote:
> 
>> On Tue, Jul 12, 2022 at 03:39:56PM -0400, Vince Weaver wrote:
>> It appears this CPU does not support TSX feature (or disabling TSX). If
>> the bug is easy to reproduce, bisecting can help.
> 
> I thought TSX was disabled via firmware update for all Haswell machines?
> 
> In any case, the fuzzer is triggering the
> 	unchecked MSR access error: WRMSR to 0x689
> in intel_pmu_lbr_restore.  So either this is a false error and should be 
> disabled, or else it's a real issue and should be fixed.
> 

Could you please double check if the quirk can fix the issue on your
machine?

#Try write the exact same value from the error log to 0x689. The write
should fail.
wrmsr -p 0 0x689 0x1fffffff8101349e

#The quirk copy bits 59:60 to bits 61:62. The below write should succeed.
wrmsr -p 0 0x689 0x7fffffff8101349e

> Unfortunately the fuzzer can take up to a few days to trigger the message 
> (it's not easily repeatable) so doing a kernel bisect would take a very 
> long time.
> 

The lbr_from_signext_quirk_needed() is only invoked at boot time. Maybe
we can dump some logs to understand which variable is not expected.

Could you please apply the below patch, reboot to the patched kernel and
share the dmesg log?

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 13179f31fe10..50435ab627ad 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -300,6 +300,9 @@ static inline bool lbr_from_signext_quirk_needed(void)
 	bool tsx_support = boot_cpu_has(X86_FEATURE_HLE) ||
 			   boot_cpu_has(X86_FEATURE_RTM);

+	pr_info("%s %s. LBR has tsx %d\n", boot_cpu_has(X86_FEATURE_HLE) ?
"HLE" : "NO HLE",
+			boot_cpu_has(X86_FEATURE_RTM) ? "RTM" : "NO RTM",
+			x86_pmu.lbr_has_tsx);
 	return !tsx_support && x86_pmu.lbr_has_tsx;
 }


Thanks,
Kan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ