[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0gPTKDYpze-ejhA3ySJB0dHXQQ4uZfXQFed=PrsWh=aqw@mail.gmail.com>
Date: Thu, 13 Nov 2025 12:04:13 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>
Cc: rafael@...nel.org, daniel.lezcano@...aro.org, corbet@....net,
linux-pm@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Documentation: thermal: Add documentation for thermal throttle
On Thu, Nov 13, 2025 at 2:41 AM Srinivas Pandruvada
<srinivas.pandruvada@...ux.intel.com> wrote:
>
> Add documentation for Intel thermal throttling reporting events.
>
> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>
> ---
> Documentation/admin-guide/thermal/index.rst | 1 +
> .../admin-guide/thermal/thermal_throttle.rst | 84 +++++++++++++++++++
> 2 files changed, 85 insertions(+)
> create mode 100644 Documentation/admin-guide/thermal/thermal_throttle.rst
>
> diff --git a/Documentation/admin-guide/thermal/index.rst b/Documentation/admin-guide/thermal/index.rst
> index 193b7b01a87d..2e0cafd19f6b 100644
> --- a/Documentation/admin-guide/thermal/index.rst
> +++ b/Documentation/admin-guide/thermal/index.rst
> @@ -6,3 +6,4 @@ Thermal Subsystem
> :maxdepth: 1
>
> intel_powerclamp
> + thermal_throttle
> diff --git a/Documentation/admin-guide/thermal/thermal_throttle.rst b/Documentation/admin-guide/thermal/thermal_throttle.rst
> new file mode 100644
> index 000000000000..ab146ffdffca
> --- /dev/null
> +++ b/Documentation/admin-guide/thermal/thermal_throttle.rst
> @@ -0,0 +1,84 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: <isonum.txt>
> +
> +=======================================
> +Intel thermal throttle events reporting
> +=======================================
> +
> +:Author: Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>
> +
> +Introduction
> +------------
> +
> +Intel processors have built in automatic and adaptive thermal monitoring mechanisms
> +that force the processor to reduce its power consumption in order to operate within
> +predetermined temperature limits.
> +
> +Refer to section "THERMAL MONITORING AND PROTECTION" in the "Intel® 64 and IA-32
> +Architectures Software Developer’s Manual Volume 3 (3A, 3B, 3C, & 3D): System
> +Programming Guide" for more details.
> +
> +In general, there are two mechanisms to control the core temperature of the processor.
> +They are called "Thermal Monitor 1 (TM1) and Thermal Monitor 2 (TM2)".
> +
> +The status of the temperature sensor that triggers the thermal monitor (TM1/TM2) is
> +indicated through the "thermal status flag" and "thermal status log flag" in the
> +IA32_THERM_STATUS MSR for core level and IA32_PACKAGE_THERM_STATUS for package level.
I would use the MSR names from the code, that is MSR_IA32_THERM_STATUS
and MSR_IA32_PACKAGE_THERM_STATUS, respectively, here and below.
> +
> +Thermal Status flag, bit 0 — When set, indicates that the processor core temperature
> +is currently at the trip temperature of the thermal monitor and that the processor power
> +consumption is being reduced via either TM1 or TM2, depending on which is enabled. When
> +clear, the flag indicates that the core temperature is below the thermal monitor trip
> +temperature. This flag is read only.
> +
> +Thermal Status Log flag, bit 1 — When set, indicates that the thermal sensor has tripped
> +since the last power-up or reset or since the last time that software cleared this flag.
> +This flag is a sticky bit; once set it remains set until cleared by software or until a
> +power-up or reset of the processor. The default state is clear.
> +
> +It is possible that when user reads IA32_THERM_STATUS or IA32_PACKAGE_THERM_STATUS,
> +TM1/TM2 is not active. In this case, "Thermal Status flag" will read "0" and the
> +"Thermal Status Log flag" will be set to show any previous "TM1/TM2" activation. But
> +since it needs to be cleared by software, it can't show the number of occurrences of
> +"TM1/TM2" activations.
> +
> +Hence, Linux provides counters of how many times the "Thermal Status flag" was set. Also
> +presents how long the "Thermal Status flag" was active in milliseconds. Using these counters,
> +users can check if the performance was limited because of thermal events. It is recommended
> +to read from sysfs instead of directly reading MSRs as the "Thermal Status Log flag" is reset
> +by the driver to implement rate control.
> +
> +Sysfs Interface
> +---------------
> +
> +Thermal throttling events are presented for each CPU under
> +"/sys/devices/system/cpu/cpuX/thermal_throttle/", where "X" is the CPU number.
> +
> +All these counters are read-only. They can't be reset to 0. So, they can potentially
> +overflow after reaching the maximum 64 bit unsigned integer.
> +
> +``core_throttle_count``
> + This shows how many times "Thermal Status flag" changed from 0 to 1
> + for this CPU. This is a 64 bit counter.
I would say "Number of times "Thermal Status flag" has changed from 0
to 1 since ...."
> +
> +``package_throttle_count``
> + This shows how many times "Thermal Status flag" changed from 0 to 1
> + for this package. Package status is broadcast to all CPUs; all CPUs in
> + the package increment this count. This is a 64-bit counter.
I would say "Number of times "Thermal Status flag" has changed from 0
to 1 for the package containing this CPU since ..."
> +
> +``core_throttle_max_time_ms``
> + This shows the maximum amount of time "Thermal Status flag" was set to 1
> + for this CPU for core level flag.
I would say "Maximum amount of time for which "Thermal Status flag"
has been set to 1 for this CPU at the core level since ...".
And analogously below.
> +
> +``package_throttle_max_time_ms``
> + This shows the maximum amount of time "Thermal Status flag" was set to 1
> + for this CPU for package level flag.
> +
> +``core_throttle_total_time_ms``
> + This shows the cumulative time "Thermal Status flag" was set to 1 for this
> + CPU for core level flag.
> +
> +``package_throttle_total_time_ms``
> + This shows the cumulative time "Thermal Status flag" was set to 1 for this
> + CPU for package level flag.
> +
> --
Powered by blists - more mailing lists