[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y+ZXLy0P0Sggbrxc@hirez.programming.kicks-ass.net>
Date: Fri, 10 Feb 2023 15:39:43 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
Alexey Dobriyan <adobriyan@...il.com>,
Wei Li <liwei391@...wei.com>,
Mirsad Goran Todorovac <mirsad.todorovac@....unizg.hr>,
Thomas Gleixner <tglx@...utronix.de>,
Yu Liao <liaoyu15@...wei.com>, Hillf Danton <hdanton@...a.com>,
Ingo Molnar <mingo@...nel.org>
Subject: Re: [PATCH 4/6] timers/nohz: Add a comment about broken iowait
counter update race
On Fri, Feb 10, 2023 at 03:09:15PM +0100, Frederic Weisbecker wrote:
> The per-cpu iowait task counter is incremented locally upon sleeping.
> But since the task can be woken to (and by) another CPU, the counter may
> then be decremented remotely. This is the source of a race involving
> readers VS writer of idle/iowait sleeptime.
>
> The following scenario shows an example where a /proc/stat reader
> observes a pending sleep time as IO whereas that pending sleep time
> later eventually gets accounted as non-IO.
>
> CPU 0 CPU 1 CPU 2
> ----- ----- ------
> //io_schedule() TASK A
> current->in_iowait = 1
> rq(0)->nr_iowait++
> //switch to idle
> // READ /proc/stat
> // See nr_iowait_cpu(0) == 1
> return ts->iowait_sleeptime +
> ktime_sub(ktime_get(), ts->idle_entrytime)
>
> //try_to_wake_up(TASK A)
> rq(0)->nr_iowait--
> //idle exit
> // See nr_iowait_cpu(0) == 0
> ts->idle_sleeptime += ktime_sub(ktime_get(), ts->idle_entrytime)
>
> As a result subsequent reads on /proc/stat may expose backward progress.
>
> This is unfortunately hardly fixable. Just add a comment about that
> condition.
It is far worse than that, the whole concept of per-cpu iowait is
absurd. Also see the comment near nr_iowait().
Powered by blists - more mailing lists