[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20251125093644.1786359-1-jackzxcui1989@163.com>
Date: Tue, 25 Nov 2025 17:36:44 +0800
From: Xin Zhao <jackzxcui1989@....com>
To: frederic@...nel.org
Cc: adobriyan@...il.com,
hdanton@...a.com,
liaoyu15@...wei.com,
linux-kernel@...r.kernel.org,
liwei391@...wei.com,
mingo@...nel.org,
mirsad.todorovac@....unizg.hr,
peterz@...radead.org,
tglx@...utronix.de
Subject: Re: [PATCH 4/8] timers/nohz: Add a comment about broken iowait counter update race
On Wed, 22 Feb 2023 15:46:45 +0100 Frederic Weisbecker <frederic@...nel.org> wrote:
> The following scenario shows an example where a /proc/stat reader
> observes a pending sleep time as IO whereas that pending sleep time
> later eventually gets accounted as non-IO.
>
> CPU 0 CPU 1 CPU 2
> ----- ----- ------
> //io_schedule() TASK A
> current->in_iowait = 1
> rq(0)->nr_iowait++
> //switch to idle
> // READ /proc/stat
> // See nr_iowait_cpu(0) == 1
> return ts->iowait_sleeptime +
> ktime_sub(ktime_get(), ts->idle_entrytime)
>
> //try_to_wake_up(TASK A)
> rq(0)->nr_iowait--
> //idle exit
> // See nr_iowait_cpu(0) == 0
> ts->idle_sleeptime += ktime_sub(ktime_get(), ts->idle_entrytime)
>
> As a result subsequent reads on /proc/stat may expose backward progress.
...
> *
> * Return the cumulative idle time (since boot) for a given
> - * CPU, in microseconds.
> + * CPU, in microseconds. Note this is partially broken due to
> + * the counter of iowait tasks that can be remotely updated without
> + * any synchronization. Therefore it is possible to observe backward
> + * values within two consecutive reads.
> *
> * This time is measured via accounting rather than sampling,
> * and is as accurate as ktime_get() is.
Dear Frederic,
I indeed encountered the situation in our project where both the iowait and idle statistics
decreased during the updates. We encountered this scenario while testing the standby wake-up.
I looked at the illustration you provided above and could understand well how the iowait
statistics could decrease, but I just can't figure out why the idle statistics would also
decrease.
I believe that the iowait statistics for the rq only increase when the prev task is scheduled
out, so the increase should only occur on the local CPU. The remote CPU only decreases the rq's
iowait statistics when waking up. I haven’t found any situation where a remote CPU could cause
an increase in the rq's iowait statistics. Therefore, I understand that there shouldn't be a
decrease in the idle statistics between two updates. I truly cannot comprehend what could have
caused this.
Could you provide some hints? It would be even better if it relates to another illustration like
above. Thank you very much!
Xin Zhao
Powered by blists - more mailing lists