linux-kernel - Re: [PATCH] watchdog/softlockup:Fix incorrect CPU utilization output during softlockup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CA+WzARmV27Oo82zhZpmrbtBOG+e6UMVcCUa9ShcyCW6YEGK5jg@mail.gmail.com>
Date: Mon, 18 Aug 2025 16:16:41 +0800
From: Zhenguo Yao <yaozhenguo1@...il.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: tglx@...utronix.de, yaoma@...ux.alibaba.com, max.kellermann@...os.com, 
	lihuafei1@...wei.com, yaozhenguo@...com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] watchdog/softlockup:Fix incorrect CPU utilization output
 during softlockup

Andrew Morton <akpm@...ux-foundation.org> 于2025年8月18日周一 10:18写道：
>
> On Tue, 12 Aug 2025 16:25:10 +0800 yaozhenguo <yaozhenguo1@...il.com> wrote:
>
> > From: ZhenguoYao <yaozhenguo1@...il.com>
> >
> > Since we use 16-bit precision, the raw data will undergo
> > integer division, which may sometimes result in data loss.
> > This can lead to slightly inaccurate CPU utilization calculations.
> > Under normal circumstances, this isn’t an issue.  However,
> > when CPU utilization reaches 100%, the calculated result might
> > exceed 100%.  For example, with raw data like the following:
> >
> > sample_period 400000134 new_stat 83648414036 old_stat 83247417494
> >
> > sample_period=400000134/2^24=23
> > new_stat=83648414036/2^24=4985
> > old_stat=83247417494/2^24=4961
> > util=105%
> >
> > Below log will output：
> >
> > CPU#3 Utilization every 0s during lockup:
> >     #1:   0% system,          0% softirq,   105% hardirq,     0% idle
> >     #2:   0% system,          0% softirq,   105% hardirq,     0% idle
> >     #3:   0% system,          0% softirq,   100% hardirq,     0% idle
> >     #4:   0% system,          0% softirq,   105% hardirq,     0% idle
> >     #5:   0% system,          0% softirq,   105% hardirq,     0% idle
> >
> > To avoid confusion, we enforce a 100% display cap when
> > calculations exceed this threshold.
> >
> > ...
> >
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -444,6 +444,13 @@ static void update_cpustat(void)
> >               old_stat = __this_cpu_read(cpustat_old[i]);
> >               new_stat = get_16bit_precision(cpustat[tracked_stats[i]]);
> >               util = DIV_ROUND_UP(100 * (new_stat - old_stat), sample_period_16);
> > +             /* Since we use 16-bit precision, the raw data will undergo
>
>                 /*
>                  * Since ...
>
> please.
>
> > +              * integer division, which may sometimes result in data loss,
> > +              * and then result might exceed 100%. To avoid confusion,
> > +              * we enforce a 100% display cap when calculations exceed this threshold.
> > +              */
> > +             if (util > 100)
> > +                     util = 100;
> >               __this_cpu_write(cpustat_util[tail][i], util);
> >               __this_cpu_write(cpustat_old[i], new_stat);
> >       }
>
> Can we do something to make this output more accurate?  For example,
>
>         return (data_ns + (1 << 23)) >> 24LL;
>
> would round to the nearest multiple of 16.8ms?
>
>
Yes.