lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <x49cz0hxdfa.fsf@segfault.boston.devel.redhat.com>
Date:   Mon, 24 Jul 2023 16:27:53 -0400
From:   Jeff Moyer <jmoyer@...hat.com>
To:     Pavel Begunkov <asml.silence@...il.com>
Cc:     Jens Axboe <axboe@...nel.dk>, Greg KH <gregkh@...uxfoundation.org>,
        Phil Elwell <phil@...pberrypi.com>, andres@...razel.de,
        david@...morbit.com, hch@....de, io-uring@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>, linux-xfs@...r.kernel.org,
        stable <stable@...r.kernel.org>, riel@...riel.com
Subject: Re: [PATCH] io_uring: Use io_schedule* in cqring wait

Pavel Begunkov <asml.silence@...il.com> writes:

> On 7/24/23 16:58, Jens Axboe wrote:
>> Even though I don't think this is an actual problem, it is a bit
>> confusing that you get 100% iowait while waiting without having IO
>> pending. So I do think the suggested patch is probably worthwhile
>> pursuing. I'll post it and hopefully have Andres test it too, if he's
>> available.
>
> Emmm, what's the definition of the "IO" state? Unless we can say what exactly
> it is there will be no end to adjustments, because I can easily argue that
> CQ waiting by itself is IO.
> Do we consider sleep(N) to be "IO"? I don't think the kernel uses io
> schedule around that, and so it'd be different from io_uring waiting for
> a timeout request. What about epoll waiting, etc.?

See Documentation/filesystems/proc.rst (and mainly commit 9c240d757658
("Change the document about iowait")):

- iowait: In a word, iowait stands for waiting for I/O to complete. But there
  are several problems:

  1. CPU will not wait for I/O to complete, iowait is the time that a task is
     waiting for I/O to complete. When CPU goes into idle state for
     outstanding task I/O, another task will be scheduled on this CPU.
  2. In a multi-core CPU, the task waiting for I/O to complete is not running
     on any CPU, so the iowait of each CPU is difficult to calculate.
  3. The value of iowait field in /proc/stat will decrease in certain
     conditions.

  So, the iowait is not reliable by reading from /proc/stat.

Also, vmstat(8):
       wa: Time spent waiting for IO.  Prior to Linux 2.5.41, included in idle.

iostat/mpstat man pages:
              %iowait
                     Show the percentage of time that the  CPU  or  CPUs  were
                     idle  during which the system had an outstanding disk I/O
                     request.

sar(1):
              %iowait
                     Percentage of time that the CPU or CPUs were idle  during
                     which the system had an outstanding disk I/O request.

iowait was initially introduced in 2002 by Rik van Riel in historical
git commit 7b88e5e0bdf25 ("[PATCH] "io wait" process accounting").  The
changelog from akpm reads:

    Patch from Rik adds "I/O wait" statistics to /proc/stat.
    
    This allows us to determine how much system time is being spent
    awaiting IO completion.  This is an important statistic, as it tends to
    directly subtract from job completion time.
    
    procps-2.0.9 is OK with this, but doesn't report it.

I vaguely recall there was confusion from users about why the system was
idle when running database workloads.  Maybe Rik can remember more
clearly.

Anyway, as you can see, the definition is murky, at best.  I don't think
we should overthink it.  I agree with the principle of Jens'
patch--let's just not surprise users with a change in behavior.

Cheers,
Jeff

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ