[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230406213640.GBZC87aMhjL8LN6NUI@fat_crate.local>
Date: Thu, 6 Apr 2023 23:36:40 +0200
From: Borislav Petkov <bp@...en8.de>
To: Rui Salvaterra <rsalvaterra@...il.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
"Rafael J. Wysocki" <rafael@...nel.org>,
Daniel Lezcano <daniel.lezcano@...aro.org>,
Amit Kucheria <amitk@...nel.org>,
Zhang Rui <rui.zhang@...el.com>,
Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
linux-pm@...r.kernel.org
Subject: Re: [BUG?] unchecked MSR access error: WRMSR to 0x19c
CCing more appropiate people and quoting the whole mail...
On Wed, Apr 05, 2023 at 11:14:45PM +0100, Rui Salvaterra wrote:
> Hi, everyone,
>
> I have a Haswell (Core i7-4770R) machine running Linux 6.3-rc5 on
> which, after a while under load (say, compiling the kernel), I get
> this trace…
>
> [ 832.549630] unchecked MSR access error: WRMSR to 0x19c (tried to
> write 0x000000000000aaa8) at rIP: 0xffffffff816f66a6
> (throttle_active_work+0xa6/0x1d0)
> [ 832.549652] Call Trace:
> [ 832.549654] <TASK>
> [ 832.549655] process_one_work+0x1ab/0x300
> [ 832.549661] worker_thread+0x4b/0x340
> [ 832.549664] ? process_one_work+0x300/0x300
> [ 832.549676] kthread+0xac/0xc0
> [ 832.549679] ? kthread_exit+0x20/0x20
> [ 832.549682] ret_from_fork+0x1f/0x30
> [ 832.549693] </TASK>
>
> … after which I get these from time to time in dmesg.
>
> [ 836.709562] CPU7: Core temperature is above threshold, cpu clock is
> throttled (total events = 219)
> [ 836.709569] CPU3: Core temperature is above threshold, cpu clock is
> throttled (total events = 219)
> [ 1272.792138] CPU2: Core temperature is above threshold, cpu clock is
> throttled (total events = 1)
> [ 1272.792156] CPU6: Core temperature is above threshold, cpu clock is
> throttled (total events = 1)
>
> This is the microcode revision on the CPU.
>
> [ 0.000000] microcode: updated early: 0xe -> 0x1c, date = 2019-11-12
>
> Note that I have the exact same issue on an Ivy Bridge (Core
> i7-3720QM) machine, but not on an Ivy Bridge laptop (Celeron 1007U).
> Maybe this is a legitimate warning, but please note that I've
> thorughly cleaned the machines before retesting to see if, by
> coincidence, I had any airway/cooling issues. The fact that it started
> happening recently (since Linux 6.1, I believe), and the fact that
> running stress-ng --cpu 16 before the unchecked WRMSR error happens
> doesn't cause any thermal throttling events, lead me to believe this
> is possibly some unintended oversight.
>
> Please let me know if you need any additional information (.config, or
> anything else).
>
> Thanks in advance,
> Rui
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists