[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aS4FztjNAwVNfoUk@gmail.com>
Date: Mon, 1 Dec 2025 22:17:02 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Xin Zhao <jackzxcui1989@....com>
Cc: anna-maria@...utronix.de, frederic@...nel.org, tglx@...utronix.de,
kuba@...nel.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 2/2] timers/nohz: Avoid /proc/stat idle/iowait
fluctuation when cpu hotplug
* Xin Zhao <jackzxcui1989@....com> wrote:
> The idle and iowait statistics in /proc/stat are obtained through
> get_idle_time and get_iowait_time. Assuming CONFIG_NO_HZ_COMMON is
> enabled, when CPU is online, the idle and iowait values use the
> idle_sleeptime and iowait_sleeptime statistics from tick_cpu_sched, but
> use CPUTIME_IDLE and CPUTIME_IOWAIT items from kernel_cpustat when CPU
> is offline. Although /proc/stat do not print statistics of offline CPU,
> it still print aggregated statistics for all possible CPUs.
> tick_cpu_sched and kernel_cpustat are maintained by different logic,
> leading to a significant gap. The first line of the data below shows the
> /proc/stat output when only one CPU remains after CPU offline, the second
> line shows the /proc/stat output after all CPUs are brought back online:
>
> cpu 2408558 2 916619 4275883 5403 123758 64685 0 0 0
> cpu 2408588 2 916693 4200737 4184 123762 64686 0 0 0
Yeah, that outlier indeed looks suboptimal, and there's
very little user-space tooling can do to detect it. I
think your suggestion, to use the 'frozen' values of an
offline CPU, might as well be the right approach.
What value is printed if the CPU was never online, is
it properly initialized to zero?
> Obviously, other values do not experience significant fluctuations, while
> idle/iowait statistics show a substantial decrease, which make system CPU
> monitoring troublesome.
> Introduce get_cpu_idle_time_us_raw and get_cpu_iowait_time_us_raw, so that
> /proc/stat logic can use them to get the last raw value of idle_sleeptime
> and iowait_sleeptime from tick_cpu_sched without any calculation when CPU
> is offline. It avoids /proc/stat idle/iowait fluctuation when cpu hotplug.
>
> Signed-off-by: Xin Zhao <jackzxcui1989@....com>
> ---
> fs/proc/stat.c | 4 ++++
> include/linux/tick.h | 4 ++++
> kernel/time/tick-sched.c | 46 ++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 54 insertions(+)
>
> diff --git a/fs/proc/stat.c b/fs/proc/stat.c
> index 8b444e862..de13a2e1c 100644
> --- a/fs/proc/stat.c
> +++ b/fs/proc/stat.c
> @@ -28,6 +28,8 @@ u64 get_idle_time(struct kernel_cpustat *kcs, int cpu)
>
> if (cpu_online(cpu))
> idle_usecs = get_cpu_idle_time_us(cpu, NULL);
> + else
> + idle_usecs = get_cpu_idle_time_us_raw(cpu);
>
> if (idle_usecs == -1ULL)
> /* !NO_HZ or cpu offline so we can rely on cpustat.idle */
> @@ -44,6 +46,8 @@ static u64 get_iowait_time(struct kernel_cpustat *kcs, int cpu)
>
> if (cpu_online(cpu))
> iowait_usecs = get_cpu_iowait_time_us(cpu, NULL);
> + else
> + iowait_usecs = get_cpu_iowait_time_us_raw(cpu);
So why not just use the get_cpu_idle_time_us() and
get_cpu_iowait_time_us() values unconditionally, for
all possible_cpus?
The raw/non-raw distinction makes very little sense in
this context, the read_seqlock_retry loop will always
succeed after a single step (because there are no
writers), so the behavior of the full get_cpu_idle/iowait_time_us()
functions should be close to the _raw() variants.
Patch would be much simpler that way.
Thanks,
Ingo
Powered by blists - more mailing lists