[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20190712082209.16073-1-alan.christopher.jenkins@gmail.com>
Date:   Fri, 12 Jul 2019 09:22:09 +0100
From:   Alan Jenkins <alan.christopher.jenkins@...il.com>
To:     linux-doc@...r.kernel.org, Jonathan Corbet <corbet@....net>
Cc:     Doug Smythies <dsmythies@...us.net>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org,
        Alan Jenkins <alan.christopher.jenkins@...il.com>
Subject: [PATCH] Documentation: proc.txt: emphasize that iowait cannot be relied on
CPU "iowait" time in /proc/stat does not work on my laptop.
I saw the documentation mention several problems with "iowait".  However
each problem appeared to be qualified.  It gave me the impression I could
probably account for each problem.  My impression was wrong.
There are a couple of writeups explaining the specific problem I had.[1][2]
[1] "[RFC PATCH 0/8] rework iowait accounting", 2014-06-26:
    https://lore.kernel.org/lkml/53ABE28F.6010402@jp.fujitsu.com/
[2] A recent writeup by myself:
    https://unix.stackexchange.com/questions/517757/my-basic-assumption-about-system-iowait-does-not-hold/527836#527836
This might just be me.  Partly, my small knowledge about the scheduler
allowed for false assumptions.  But I think we can emphasize more strongly
how broken iowait is on SMP.  Overall, I aim to make it sound much scarier
to analyze iowait.  I add some precise details, and also some
anxiety-inducing vagueness :-).
[Detailed reasons for the specific points I included:]
1. Let us say that "iowait _can_ be massively under-accounted".  It is
likely to remain true in future.  At least since v4.16, the
under-accounting problem seems very exposed on non-virtual, multi-CPU
systems.  In theory the wheel might turn again; this exposure might be
reduced in future.  But even on v4.15, I can reproduce the problem using
CPU affinity.
2. Point to NO_HZ_IDLE, as a good hint towards i) the nature of the problem
and ii) and how widespread it is.  To give a more comprehensive picture,
also point to NO_HZ_FULL and VIRT_CPU_ACCOUNTING_NATIVE.
Setting down my exact scenario would require a lot of specifics.  That
would be going beyond the point.  We could link to one of the writeups as
well, but I don't think we need to.
3. My own "use case" did not expose the problem when I ran it on a virtual
machine.  Even using my CPU affinity method.[2]  I haven't tracked down
why.  This is a significant qualification to point 1.  Explicitly
acknowledge this.  It's a pain, but it makes the main point easier to
verify, and hence more credible.
(I suspect this is common at least to small test VMs.  It appears true
for both a Fedora 30 VM (5.1.x) and a Debian 9 VM (4.9.x).  I also tried
some different storage options, virtio-blk v.s. virtio-scsi v.s. isilogic.)
[:end of details]
Signed-off-by: Alan Jenkins <alan.christopher.jenkins@...il.com>
---
 Documentation/filesystems/proc.txt | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 66cad5c86171..f1da71cd276e 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -1348,16 +1348,23 @@ second).  The meanings of the columns are as follows, from left to right:
 - nice: niced processes executing in user mode
 - system: processes executing in kernel mode
 - idle: twiddling thumbs
-- iowait: In a word, iowait stands for waiting for I/O to complete. But there
-  are several problems:
-  1. Cpu will not wait for I/O to complete, iowait is the time that a task is
-     waiting for I/O to complete. When cpu goes into idle state for
-     outstanding task io, another task will be scheduled on this CPU.
-  2. In a multi-core CPU, the task waiting for I/O to complete is not running
-     on any CPU, so the iowait of each CPU is difficult to calculate.
-  3. The value of iowait field in /proc/stat will decrease in certain
+- iowait: In a word, iowait stands for waiting for I/O to complete.  This
+  number is not reliable.  The problems include:
+  1. A CPU does not wait for I/O to complete; iowait is the time that a task
+     is waiting for I/O to complete.  When a CPU goes into idle state for
+     outstanding task I/O, another task will be scheduled on this CPU.
+  2. iowait was extended to support systems with multiple CPUs. But the
+     extended version is misleading.  Consider a two-CPU system, where you see
+     50% iowait.  This could represent two tasks that could use 100% of both
+     CPUs, if they were not waiting for I/O.
+  3. iowait can be massively under-accounted on modern kernels.  The iowait
+     code does not account for the behaviour of NO_HZ_IDLE, NO_HZ_FULL, or
+     VIRT_CPU_ACCOUNTING_NATIVE on multi-CPU systems.  The amount of
+     under-accounting varies depending on the exact system configuration and
+     kernel version.  The effects might be less obvious when running in a
+     virtual machine.
+  4. The value of iowait field in /proc/stat will decrease in certain
      conditions.
-  So, the iowait is not reliable by reading from /proc/stat.
 - irq: servicing interrupts
 - softirq: servicing softirqs
 - steal: involuntary wait
-- 
2.21.0
Powered by blists - more mailing lists
 
