lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e82b9d7c-81e5-dd80-b9c0-f5f065344e2f@gmail.com>
Date:   Fri, 5 Jul 2019 12:25:46 +0100
From:   Alan Jenkins <alan.christopher.jenkins@...il.com>
To:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org
Cc:     Doug Smythies <dsmythies@...us.net>, linux-pm@...r.kernel.org
Subject: Re: iowait v.s. idle accounting is "inconsistent" - iowait is too low

Hi, scheduler experts!

My cpu "iowait" time appears to be reported incorrectly.  Do you know 
why this could happen?

Doug helped me - it was he who noticed different behaviour on v4.15 vs 
v4.16 vs v4.17+.  So I have some confirmation of this.  I don't think 
Doug mentioned what hardware he ran the kernels on.  lscpu says my 
hardware is "Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz".

I tested using "dd" :-

dd if=bigfile bs=1M iflag=direct of=/dev/null

  (1) E.g. I get the expected result if I pin "dd" to the "right" cpu.  
Which cpu varies; it has often been cpu2.  At the moment I have booted 
5.2-rc5-ish.  Here I run "dd" with "taskset -c 0", and "iowait" worked 
as expected:

top - 11:01:47 up 15:10,  2 users,  load average: 1.07, 0.86, 0.86
Tasks: 288 total,   1 running, 287 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.3 us,  3.3 sy,  0.0 ni,  0.0 id, 94.3 wa,  0.7 hi,  1.3 si,  0.0 st
%Cpu1  :  0.3 us,  0.7 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.3 hi,  0.0 si,  0.0 st
%Cpu2  :  1.0 us,  0.7 sy,  0.0 ni, 98.0 id,  0.0 wa,  0.3 hi,  0.0 si,  0.0 st
%Cpu3  :  0.7 us,  0.3 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7854.0 total,    432.2 free,   4616.4 used,   2805.4 buff/cache
MiB Swap:   2048.0 total,   1978.2 free,     69.8 used.   2498.0 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
31849 alan-sy+  20   0  216052   2836   1800 D   3.0   0.0   0:00.58 dd
24220 alan-sy+  20   0 3339828 232160 126720 S   0.7   2.9   1:53.14 gnome-shell
...

(I have also used "atop" and "vmstat 3". "atop" shows both total and 
per-cpu iowait, idle, etc. "vmstat 3" just shows a total, but all the 
old values stay on-screen).

  (2) But compare running "dd" with "taskset -c 1":

%Cpu0  :  0.3 us,  0.3 sy,  0.0 ni, 97.3 id,  0.0 wa,  0.7 hi,  1.3 si,  0.0 st
%Cpu1  :  0.3 us,  3.0 sy,  0.0 ni, 83.7 id, 12.6 wa,  0.0 hi,  0.3 si,  0.0 st
%Cpu2  :  1.0 us,  0.3 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  1.3 us,  0.7 sy,  0.0 ni, 98.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

  (3) If I don't use "taskset", "dd" generally doesn't sit on the 
"right" cpu, and so I get don't see the right "iowait".  Here's "top -d 
30".  Over this longer interval, "dd" appears to spend a quarter of its 
time on the "right" cpu:

%Cpu0  :  0.5 us,  1.2 sy,  0.0 ni, 74.4 id, 22.2 wa,  0.5 hi,  1.3 si,  0.0 st
%Cpu1  :  0.7 us,  1.3 sy,  0.0 ni, 92.3 id,  5.7 wa,  0.1 hi,  0.0 si,  0.0 st
%Cpu2  :  0.5 us,  0.9 sy,  0.0 ni, 95.1 id,  3.3 wa,  0.1 hi,  0.0 si,  0.0 st
%Cpu3  :  0.7 us,  0.9 sy,  0.0 ni, 94.6 id,  3.6 wa,  0.1 hi,  0.0 si,  0.0 st

This point (3) does not apply to 4.15.  On 4.15, it seems "dd" naturally 
sits on the "right" cpu, so I get the "right" iowait". But if I pin "dd" 
to a different cpu, I get the "wrong" iowait again.

I bisected 4.15-4.16.  The first "bad" commit was 806486c377e3 
"sched/fair: Do not migrate if the prev_cpu is idle"

  (4) I can get the "right" iowait regardless of which cpu, if I boot 
with "nohz=off", or if I suppress nohz by dynamically disabling all 
cpuidle states except for state0 (POLL).

This point (4) does not apply to 4.16. On 4.15 and 4.16, suppressing 
nohz does not help.  (So far, I did not test 4.15).

I bisected 4.16-4.17.    The first "new" commit was 554c8aa8ecad "sched: 
idle: Select idle state before stopping the tick"

  (5) I seem to get the "right" iowait regardless of which cpu, if I run 
inside a virtual machine.  I tested stock Fedora v5.1.? inside a KVM 
(virt-manager) virtual machine, which also had 4 cpus.  Whereas 
un-virtualized Fedora v5.1.? on my laptop, behaves as per points 1-4.

I read the documented limitations for "iowait" time. As far as I 
understand them, they don't explain such inconsistent values.

> - iowait: In a word, iowait stands for waiting for I/O to complete. But there
> are several problems:
>
> 1. Cpu will not wait for I/O to complete, iowait is the time that a task is
>    waiting for I/O to complete. When cpu goes into idle state for
>    outstanding task io, another task will be scheduled on this CPU.
>
> 2. In a multi-core CPU, the task waiting for I/O to complete is not running
>    on any CPU, so the iowait of each CPU is difficult to calculate.
>
> 3. The value of iowait field in /proc/stat will decrease in certain
>    conditions


Thanks for all the kernels
Alan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ