[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+WzARmV27Oo82zhZpmrbtBOG+e6UMVcCUa9ShcyCW6YEGK5jg@mail.gmail.com>
Date: Mon, 18 Aug 2025 16:16:41 +0800
From: Zhenguo Yao <yaozhenguo1@...il.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: tglx@...utronix.de, yaoma@...ux.alibaba.com, max.kellermann@...os.com,
lihuafei1@...wei.com, yaozhenguo@...com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] watchdog/softlockup:Fix incorrect CPU utilization output
during softlockup
Andrew Morton <akpm@...ux-foundation.org> 于2025年8月18日周一 10:18写道:
>
> On Tue, 12 Aug 2025 16:25:10 +0800 yaozhenguo <yaozhenguo1@...il.com> wrote:
>
> > From: ZhenguoYao <yaozhenguo1@...il.com>
> >
> > Since we use 16-bit precision, the raw data will undergo
> > integer division, which may sometimes result in data loss.
> > This can lead to slightly inaccurate CPU utilization calculations.
> > Under normal circumstances, this isn’t an issue. However,
> > when CPU utilization reaches 100%, the calculated result might
> > exceed 100%. For example, with raw data like the following:
> >
> > sample_period 400000134 new_stat 83648414036 old_stat 83247417494
> >
> > sample_period=400000134/2^24=23
> > new_stat=83648414036/2^24=4985
> > old_stat=83247417494/2^24=4961
> > util=105%
> >
> > Below log will output:
> >
> > CPU#3 Utilization every 0s during lockup:
> > #1: 0% system, 0% softirq, 105% hardirq, 0% idle
> > #2: 0% system, 0% softirq, 105% hardirq, 0% idle
> > #3: 0% system, 0% softirq, 100% hardirq, 0% idle
> > #4: 0% system, 0% softirq, 105% hardirq, 0% idle
> > #5: 0% system, 0% softirq, 105% hardirq, 0% idle
> >
> > To avoid confusion, we enforce a 100% display cap when
> > calculations exceed this threshold.
> >
> > ...
> >
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -444,6 +444,13 @@ static void update_cpustat(void)
> > old_stat = __this_cpu_read(cpustat_old[i]);
> > new_stat = get_16bit_precision(cpustat[tracked_stats[i]]);
> > util = DIV_ROUND_UP(100 * (new_stat - old_stat), sample_period_16);
> > + /* Since we use 16-bit precision, the raw data will undergo
>
> /*
> * Since ...
>
> please.
>
> > + * integer division, which may sometimes result in data loss,
> > + * and then result might exceed 100%. To avoid confusion,
> > + * we enforce a 100% display cap when calculations exceed this threshold.
> > + */
> > + if (util > 100)
> > + util = 100;
> > __this_cpu_write(cpustat_util[tail][i], util);
> > __this_cpu_write(cpustat_old[i], new_stat);
> > }
>
> Can we do something to make this output more accurate? For example,
>
> return (data_ns + (1 << 23)) >> 24LL;
>
> would round to the nearest multiple of 16.8ms?
>
>
Yes.
Powered by blists - more mailing lists