lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 8 Dec 2022 08:36:49 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Gulam Mohamed <gulam.mohamed@...cle.com>
Cc:     linux-block@...r.kernel.org, axboe@...nel.dk,
        philipp.reisner@...bit.com, lars.ellenberg@...bit.com,
        christoph.boehmwalder@...bit.com, minchan@...nel.org,
        ngupta@...are.org, senozhatsky@...omium.org, colyli@...e.de,
        kent.overstreet@...il.com, agk@...hat.com, snitzer@...nel.org,
        dm-devel@...hat.com, song@...nel.org, dan.j.williams@...el.com,
        vishal.l.verma@...el.com, dave.jiang@...el.com,
        ira.weiny@...el.com, junxiao.bi@...cle.com,
        martin.petersen@...cle.com, kch@...dia.com,
        drbd-dev@...ts.linbit.com, linux-kernel@...r.kernel.org,
        linux-bcache@...r.kernel.org, linux-raid@...r.kernel.org,
        nvdimm@...ts.linux.dev, konrad.wilk@...cle.com, joe.jin@...cle.com,
        ming.lei@...hat.com
Subject: Re: [RFC for-6.2/block V2] block: Change the granularity of io ticks
 from ms to ns

On Wed, Dec 07, 2022 at 10:32:04PM +0000, Gulam Mohamed wrote:
> As per the review comment from Jens Axboe, I am re-sending this patch
> against "for-6.2/block".
> 
> 
> Use ktime to change the granularity of IO accounting in block layer from
> milli-seconds to nano-seconds to get the proper latency values for the
> devices whose latency is in micro-seconds. After changing the granularity
> to nano-seconds the iostat command, which was showing incorrect values for
> %util, is now showing correct values.

Please add the theory behind why using nano-seconds can get correct accounting.

> 
> We did not work on the patch to drop the logic for
> STAT_PRECISE_TIMESTAMPS yet. Will do it if this patch is ok.
> 
> The iostat command was run after starting the fio with following command
> on an NVME disk. For the same fio command, the iostat %util was showing
> ~100% for the disks whose latencies are in the range of microseconds.
> With the kernel changes (granularity to nano-seconds), the %util was
> showing correct values. Following are the details of the test and their
> output:
> 
> fio command
> -----------
> [global]
> bs=128K
> iodepth=1
> direct=1
> ioengine=libaio
> group_reporting
> time_based
> runtime=90
> thinktime=1ms
> numjobs=1
> name=raw-write
> rw=randrw
> ignore_error=EIO:EIO
> [job1]
> filename=/dev/nvme0n1
> 
> Correct values after kernel changes:
> ====================================
> iostat output
> -------------
> iostat -d /dev/nvme0n1 -x 1
> 
> Device            r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
> nvme0n1              0.08    0.05   0.06   128.00   128.00   0.07   6.50
> 
> Device            r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
> nvme0n1              0.08    0.06   0.06   128.00   128.00   0.07   6.30
> 
> Device            r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
> nvme0n1              0.06    0.05   0.06   128.00   128.00   0.06   5.70
> 
> From fio
> --------
> Read Latency: clat (usec): min=32, max=2335, avg=79.54, stdev=29.95
> Write Latency: clat (usec): min=38, max=130, avg=57.76, stdev= 3.25

Can you explain a bit why the above %util is correct?

BTW, %util is usually not important for SSDs, please see 'man iostat':

     %util
            Percentage of elapsed time during which I/O requests were issued to the device (bandwidth  uti‐
            lization for the device). Device saturation occurs when this value is close to 100% for devices
            serving requests serially.  But for devices serving requests in parallel, such as  RAID  arrays
            and modern SSDs, this number does not reflect their performance limits.


Thanks, 
Ming

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ