[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20250812082510.32291-1-yaozhenguo@jd.com>
Date: Tue, 12 Aug 2025 16:25:10 +0800
From: yaozhenguo <yaozhenguo1@...il.com>
To: tglx@...utronix.de,
yaoma@...ux.alibaba.com,
akpm@...ux-foundation.org
Cc: max.kellermann@...os.com,
lihuafei1@...wei.com,
yaozhenguo@...com,
linux-kernel@...r.kernel.org,
ZhenguoYao <yaozhenguo1@...il.com>
Subject: [PATCH] watchdog/softlockup:Fix incorrect CPU utilization output during softlockup
From: ZhenguoYao <yaozhenguo1@...il.com>
Since we use 16-bit precision, the raw data will undergo
integer division, which may sometimes result in data loss.
This can lead to slightly inaccurate CPU utilization calculations.
Under normal circumstances, this isn’t an issue. However,
when CPU utilization reaches 100%, the calculated result might
exceed 100%. For example, with raw data like the following:
sample_period 400000134 new_stat 83648414036 old_stat 83247417494
sample_period=400000134/2^24=23
new_stat=83648414036/2^24=4985
old_stat=83247417494/2^24=4961
util=105%
Below log will output:
CPU#3 Utilization every 0s during lockup:
#1: 0% system, 0% softirq, 105% hardirq, 0% idle
#2: 0% system, 0% softirq, 105% hardirq, 0% idle
#3: 0% system, 0% softirq, 100% hardirq, 0% idle
#4: 0% system, 0% softirq, 105% hardirq, 0% idle
#5: 0% system, 0% softirq, 105% hardirq, 0% idle
To avoid confusion, we enforce a 100% display cap when
calculations exceed this threshold.
Signed-off-by: ZhenguoYao <yaozhenguo1@...il.com>
---
kernel/watchdog.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 9c7134f7d2c4..29787996c69c 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -444,6 +444,13 @@ static void update_cpustat(void)
old_stat = __this_cpu_read(cpustat_old[i]);
new_stat = get_16bit_precision(cpustat[tracked_stats[i]]);
util = DIV_ROUND_UP(100 * (new_stat - old_stat), sample_period_16);
+ /* Since we use 16-bit precision, the raw data will undergo
+ * integer division, which may sometimes result in data loss,
+ * and then result might exceed 100%. To avoid confusion,
+ * we enforce a 100% display cap when calculations exceed this threshold.
+ */
+ if (util > 100)
+ util = 100;
__this_cpu_write(cpustat_util[tail][i], util);
__this_cpu_write(cpustat_old[i], new_stat);
}
--
2.43.5
Powered by blists - more mailing lists