linux-ext4 - Re: AW: Slow I/O on USB media after commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5bd51904b1b6511748c5454bce437bdc038eeb1f.camel@unipv.it>
Date:   Tue, 07 Jan 2020 08:51:41 +0100
From:   Andrea Vai <andrea.vai@...pv.it>
To:     Ming Lei <ming.lei@...hat.com>, "Theodore Y. Ts'o" <tytso@....edu>
Cc:     "Schmid, Carsten" <Carsten_Schmid@...tor.com>,
        Finn Thain <fthain@...egraphics.com.au>,
        Damien Le Moal <Damien.LeMoal@....com>,
        Alan Stern <stern@...land.harvard.edu>,
        Jens Axboe <axboe@...nel.dk>,
        Johannes Thumshirn <jthumshirn@...e.de>,
        USB list <linux-usb@...r.kernel.org>,
        SCSI development list <linux-scsi@...r.kernel.org>,
        Himanshu Madhani <himanshu.madhani@...ium.com>,
        Hannes Reinecke <hare@...e.com>,
        Omar Sandoval <osandov@...com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        Hans Holmberg <Hans.Holmberg@....com>,
        Kernel development list <linux-kernel@...r.kernel.org>,
        linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: AW: Slow I/O on USB media after commit
 f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6

Il giorno gio, 26/12/2019 alle 16.37 +0800, Ming Lei ha scritto:
> On Wed, Dec 25, 2019 at 10:30:57PM -0500, Theodore Y. Ts'o wrote:
> > On Thu, Dec 26, 2019 at 10:27:02AM +0800, Ming Lei wrote:
> > > Maybe we need to be careful for HDD., since the request count in
> scheduler
> > > queue is double of in-flight request count, and in theory NCQ
> should only
> > > cover all in-flight 32 requests. I will find a sata HDD., and
> see if
> > > performance drop can be observed in the similar 'cp' test.
> > 
> > Please try to measure it, but I'd be really surprised if it's
> > significant with with modern HDD's.
> 
> Just find one machine with AHCI SATA, and run the following xfs
> overwrite test:
> 
> #!/bin/bash
> DIR=$1
> echo 3 > /proc/sys/vm/drop_caches
> fio --readwrite=write --filesize=5g --overwrite=1 --
> filename=$DIR/fiofile \
>         --runtime=60s --time_based --ioengine=psync --direct=0 --
> bs=4k
> 		--iodepth=128 --numjobs=2 --group_reporting=1 --
> name=overwrite
> 
> FS is xfs, and disk is LVM over AHCI SATA with NCQ(depth 32),
> because the
> machine is picked up from RH beaker, and it is the only disk in the
> box.
> 
> #lsblk
> NAME                            MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
> sda                               8:0    0 931.5G  0 disk 
> ├─sda1                            8:1    0     1G  0 part /boot
> └─sda2                            8:2    0 930.5G  0 part 
>   ├─rhel_hpe--ml10gen9--01-root 253:0    0    50G  0 lvm  /
>   ├─rhel_hpe--ml10gen9--01-swap 253:1    0   3.9G  0 lvm  [SWAP]
>   └─rhel_hpe--ml10gen9--01-home 253:2    0 876.6G  0 lvm  /home
> 
> 
> kernel: 3a7ea2c483a53fc("scsi: provide mq_ops->busy() hook") which
> is
> the previous commit of f664a3cc17b7 ("scsi: kill off the legacy IO
> path").
> 
>             |scsi_mod.use_blk_mq=N |scsi_mod.use_blk_mq=Y |
> -----------------------------------------------------------
> throughput: |244MB/s               |169MB/s               |
> -----------------------------------------------------------
> 
> Similar result can be observed on v5.4 kernel(184MB/s) with same
> test
> steps.
> 
> 
> > That because they typically have
> > a queue depth of 16, and a max_sectors_kb of 32767 (e.g., just
> under
> > 32 MiB).  Sort seeks are typically 1-2 ms, with full stroke seeks
> > 8-10ms.  Typical sequential write speeds on a 7200 RPM drive is
> > 125-150 MiB/s.  So suppose every other request sent to the HDD is
> from
> > the other request stream.  The disk will chose the 8 requests from
> its
> > queue that are contiguous, and so it will be writing around 256
> MiB,
> > which will take 2-3 seconds.  If it then needs to spend between 1
> and
> > 10 ms seeking to another location of the disk, before it writes
> the
> > next 256 MiB, the worst case overhead of that seek is 10ms / 2s,
> or
> > 0.5%.  That may very well be within your measurements' error bars.
> 
> Looks you assume that disk seeking just happens once when writing
> around
> 256MB. This assumption may not be true, given all data can be in
> page
> cache before writing. So when two tasks are submitting IOs
> concurrently,
> IOs from each single task is sequential, and NCQ may order the
> current batch
> submitted from the two streams. However disk seeking may still be
> needed
> for the next batch handled by NCQ.
> 
> > And of course, note that in real life, we are very *often* writing
> to
> > multiple files in parallel, for example, during a "make -j16"
> while
> > building the kernel.  Writing a single large file is certainly
> > something people do (but even there people who are burning a 4G
> DVD
> > rip are often browsing the web while they are waiting for it to
> > complete, and the browser will be writing cache files, etc.).  So
> > whether or not this is something where we should be stressing over
> > this specific workload is going to be quite debateable.
> 

Hi,
  is there any update on this? Sorry if I am making noise, but I would
like to help to improve the kernel (or fix it) if I can help.
Otherwise, please let me know how to consider this case,

Thanks, and bye
Andrea