[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4b15d3d1-389b-fee4-d1b9-8732859e3696@linux.intel.com>
Date: Mon, 11 Jul 2022 11:25:34 -0400
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Vince Weaver <vincent.weaver@...ne.edu>
Cc: linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
pawan.kumar.gupta@...ux.intel.com
Subject: Re: [perf] unchecked MSR access error: WRMSR to 0x689 in
intel_pmu_lbr_restore
On 2022-07-08 12:13 p.m., Vince Weaver wrote:
> On Wed, 6 Jul 2022, Vince Weaver wrote:
>
>> Let the fuzzer running a long time on 5.19-rc1 and after a few weeks it
>> triggered this weird trace. It is repeatable (although I haven't
>> narrowed down exactly what's causing it).
>>
>> It's odd in that it just dumps a <TASK>, it doesn't provide any info on
>> what the actual trigger is.
>>
>> This is on a Haswell machine.
>
> I bumped up to current git and managed to trigger this again, this time
> it actually managed to print the error message.
>
> [ 7763.384369] unchecked MSR access error: WRMSR to 0x689 (tried to write 0x1fffffff8101349e) at rIP: 0xffffffff810704a4 (native_write_msr+0x4/0x20)
The 0x689 is a valid LBR register, which is MSR_LASTBRANCH_9_FROM_IP.
The issue should be caused by the known TSX bug, which is mentioned in
the commit 9fc9ddd61e0 ("perf/x86/intel: Fix MSR_LAST_BRANCH_FROM_x bug
when no TSX"). It looks like the TSX support has been deactivated,
however the quirk in the commit isn't applied for some reason.
To apply the quirk, perf relies on the boot CPU's flag and LBR format.
static inline bool lbr_from_signext_quirk_needed(void)
{
bool tsx_support = boot_cpu_has(X86_FEATURE_HLE) ||
boot_cpu_has(X86_FEATURE_RTM);
return !tsx_support && x86_pmu.lbr_has_tsx;
}
Could you please share the value of the PERF_CAPABILITIES MSR 0x00000345
of the machine?
I'd like to double check whether the LBR fromat is correct. 0x5 is expected.
If the LBR format is correct, maybe the boot CPU's flag is not cleared
when the TSX support is deactivated.
I noticed that Pawan recently had several TSX patches merged which may
impact the flags.
400331f8ffa3 ("x86/tsx: Disable TSX development mode at boot")
258f3b8c3210 ("x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits")
If you only observe the issue with the latest kernel, you may want to
revert the above two patches and see if it helps.
Thanks,
Kan
> [ 7763.397420] Call Trace:
> [ 7763.399881] <TASK>
> [ 7763.401994] intel_pmu_lbr_restore+0x9a/0x1f0
> [ 7763.406363] intel_pmu_lbr_sched_task+0x91/0x1c0
> [ 7763.410992] __perf_event_task_sched_in+0x1cd/0x240
> [ 7763.415879] ? __perf_event_task_sched_out+0x75/0x5c0
> [ 7763.420939] finish_task_switch.isra.0+0x15b/0x2a0
> [ 7763.425740] ? __switch_to+0x112/0x430
> [ 7763.429503] __schedule+0x2cf/0x10d0
> [ 7763.433088] ? send_signal_locked+0xc8/0x130
> [ 7763.437364] schedule+0x4e/0xb0
> [ 7763.440518] do_wait+0x15b/0x2f0
> [ 7763.443757] kernel_wait4+0xa6/0x140
> [ 7763.447337] ? thread_group_exited+0x60/0x60
> [ 7763.451608] do_syscall_64+0x3b/0xc0
> [ 7763.455199] entry_SYSCALL_64_after_hwframe+0x46/0xb0
> [ 7763.460259] RIP: 0033:0x7f9063feba26
> [ 7763.463838] Code: ff e9 0e 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 3d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 90 48 83 ec 28 89 54 24 14 48 89 74 24
> [ 7763.482587] RSP: 002b:00007ffca3875c08 EFLAGS: 00000246 ORIG_RAX: 000000000000003d
> [ 7763.490161] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f9063feba26
> [ 7763.497303] RDX: 0000000000000000 RSI: 00007ffca3875c1c RDI: 0000000000002214
> [ 7763.504441] RBP: 00007ffca3875c20 R08: 00007f90640f321c R09: 00007f90640f3240
> [ 7763.511577] R10: 0000000000000000 R11: 0000000000000246 R12: 000055d6ce93c4f0
> [ 7763.518718] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 7763.525850] </TASK>
>
Powered by blists - more mailing lists