lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 4 Mar 2024 14:22:50 -0500
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Borislav Petkov <bp@...en8.de>, x86-ml <x86@...nel.org>
Cc: Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Adrian Hunter <adrian.hunter@...el.com>,
 Alexander Antonov <alexander.antonov@...ux.intel.com>,
 lkml <linux-kernel@...r.kernel.org>, Thomas Gleixner <tglx@...utronix.de>
Subject: Re: unchecked MSR access error: WRMSR to 0xd84 (tried to write
 0x0000000000010003) at rIP: 0xffffffffa025a1b8
 (snbep_uncore_msr_init_box+0x38/0x60 [intel_uncore])



On 2024-03-04 1:18 p.m., Borislav Petkov wrote:
> Hi all,
> 
> sending this to a bunch of people who have touched this function
> recently and some more relevant Intel folks.
> 
> The machine is an old SNB:
> 
> smpboot: CPU0: Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz (family: 0x6, model: 0x2d, stepping: 0x7)
> 
> and with latest linus/master + tip/master it gives the below.
> 
> It must be something new because 6.8-rc6 is fine.
> 
> ...
> i801_smbus 0000:00:1f.3: enabling device (0000 -> 0003)
> input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input5
> i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
> ACPI: button: Power Button [PWRF]
> i2c i2c-14: 4/4 memory slots populated (from DMI)
> unchecked MSR access error: WRMSR to 0xd84 (tried to write 0x0000000000010003) at rIP: 0xffffffffa025a1b8 (snbep_uncore_msr_init_box+0x38/0x60 [intel_uncore])

The 0xd84 is the box control MSR of the CBOX 4 (Please find the
definition of the MSR from page 11 of
https://www.intel.com/content/www/us/en/develop/download/intel-xeon-processor-e5-v2-and-e7-v2-product-families-uncore-performance-monitoring.html).

It looks like the driver tries to access the CBOX 4, but it is not
available on the machine.

The number of available CBOXs on a SNBEP machine is determined at boot
time. It should not be larger than the maximum number of cores.
The recent commit 89b0f15f408f ("x86/cpu/topology: Get rid of
cpuinfo::x86_max_cores") change the boot_cpu_data.x86_max_cores to
topology_num_cores_per_package().
I guess the new function probably returns a different maximum number of
cores on the machine. But I don't have a SNBEP on my hands. Could you
please help to check whether a different maximum number of cores is
returned?

Thanks,
Kan

> Call Trace:
>  <TASK>
>  ? ex_handler_msr+0xcb/0x130
>  ? fixup_exception+0x166/0x320
>  ? exc_general_protection+0xd7/0x3f0
>  ? asm_exc_general_protection+0x22/0x30
>  ? snbep_uncore_msr_init_box+0x38/0x60 [intel_uncore]
>  uncore_box_ref.part.0+0x9c/0xc0 [intel_uncore]
>  ? __pfx_uncore_event_cpu_online+0x10/0x10 [intel_uncore]
>  uncore_event_cpu_online+0x56/0x140 [intel_uncore]
>  ? __pfx_uncore_event_cpu_online+0x10/0x10 [intel_uncore]
>  cpuhp_invoke_callback+0x174/0x5e0
>  ? cpuhp_thread_fun+0x5a/0x200
>  cpuhp_thread_fun+0x17e/0x200
>  ? smpboot_thread_fn+0x2b/0x250
>  smpboot_thread_fn+0x1ad/0x250
>  ? __pfx_smpboot_thread_fn+0x10/0x10
>  kthread+0xed/0x120
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork+0x30/0x50
>  ? __pfx_kthread+0x10/0x10
> iTCO_vendor_support: vendor-support=0
>  ret_from_fork_asm+0x1a/0x30
>  </TASK>
> iTCO_wdt iTCO_wdt.1.auto: Found a Patsburg TCO device (Version=2, TCOBASE=0x0460)
> iTCO_wdt iTCO_wdt.1.auto: initialized. heartbeat=30 sec (nowayout=0)
> RAPL PMU: API unit is 2^-32 Joules, 2 fixed counters, 163840 ms ovfl timer
> ...
> 
> Thx.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ