[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5bd51904b1b6511748c5454bce437bdc038eeb1f.camel@unipv.it>
Date: Tue, 07 Jan 2020 08:51:41 +0100
From: Andrea Vai <andrea.vai@...pv.it>
To: Ming Lei <ming.lei@...hat.com>, "Theodore Y. Ts'o" <tytso@....edu>
Cc: "Schmid, Carsten" <Carsten_Schmid@...tor.com>,
Finn Thain <fthain@...egraphics.com.au>,
Damien Le Moal <Damien.LeMoal@....com>,
Alan Stern <stern@...land.harvard.edu>,
Jens Axboe <axboe@...nel.dk>,
Johannes Thumshirn <jthumshirn@...e.de>,
USB list <linux-usb@...r.kernel.org>,
SCSI development list <linux-scsi@...r.kernel.org>,
Himanshu Madhani <himanshu.madhani@...ium.com>,
Hannes Reinecke <hare@...e.com>,
Omar Sandoval <osandov@...com>,
"Martin K. Petersen" <martin.petersen@...cle.com>,
Greg KH <gregkh@...uxfoundation.org>,
Hans Holmberg <Hans.Holmberg@....com>,
Kernel development list <linux-kernel@...r.kernel.org>,
linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: AW: Slow I/O on USB media after commit
f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6
Il giorno gio, 26/12/2019 alle 16.37 +0800, Ming Lei ha scritto:
> On Wed, Dec 25, 2019 at 10:30:57PM -0500, Theodore Y. Ts'o wrote:
> > On Thu, Dec 26, 2019 at 10:27:02AM +0800, Ming Lei wrote:
> > > Maybe we need to be careful for HDD., since the request count in
> scheduler
> > > queue is double of in-flight request count, and in theory NCQ
> should only
> > > cover all in-flight 32 requests. I will find a sata HDD., and
> see if
> > > performance drop can be observed in the similar 'cp' test.
> >
> > Please try to measure it, but I'd be really surprised if it's
> > significant with with modern HDD's.
>
> Just find one machine with AHCI SATA, and run the following xfs
> overwrite test:
>
> #!/bin/bash
> DIR=$1
> echo 3 > /proc/sys/vm/drop_caches
> fio --readwrite=write --filesize=5g --overwrite=1 --
> filename=$DIR/fiofile \
> --runtime=60s --time_based --ioengine=psync --direct=0 --
> bs=4k
> --iodepth=128 --numjobs=2 --group_reporting=1 --
> name=overwrite
>
> FS is xfs, and disk is LVM over AHCI SATA with NCQ(depth 32),
> because the
> machine is picked up from RH beaker, and it is the only disk in the
> box.
>
> #lsblk
> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> sda 8:0 0 931.5G 0 disk
> ├─sda1 8:1 0 1G 0 part /boot
> └─sda2 8:2 0 930.5G 0 part
> ├─rhel_hpe--ml10gen9--01-root 253:0 0 50G 0 lvm /
> ├─rhel_hpe--ml10gen9--01-swap 253:1 0 3.9G 0 lvm [SWAP]
> └─rhel_hpe--ml10gen9--01-home 253:2 0 876.6G 0 lvm /home
>
>
> kernel: 3a7ea2c483a53fc("scsi: provide mq_ops->busy() hook") which
> is
> the previous commit of f664a3cc17b7 ("scsi: kill off the legacy IO
> path").
>
> |scsi_mod.use_blk_mq=N |scsi_mod.use_blk_mq=Y |
> -----------------------------------------------------------
> throughput: |244MB/s |169MB/s |
> -----------------------------------------------------------
>
> Similar result can be observed on v5.4 kernel(184MB/s) with same
> test
> steps.
>
>
> > That because they typically have
> > a queue depth of 16, and a max_sectors_kb of 32767 (e.g., just
> under
> > 32 MiB). Sort seeks are typically 1-2 ms, with full stroke seeks
> > 8-10ms. Typical sequential write speeds on a 7200 RPM drive is
> > 125-150 MiB/s. So suppose every other request sent to the HDD is
> from
> > the other request stream. The disk will chose the 8 requests from
> its
> > queue that are contiguous, and so it will be writing around 256
> MiB,
> > which will take 2-3 seconds. If it then needs to spend between 1
> and
> > 10 ms seeking to another location of the disk, before it writes
> the
> > next 256 MiB, the worst case overhead of that seek is 10ms / 2s,
> or
> > 0.5%. That may very well be within your measurements' error bars.
>
> Looks you assume that disk seeking just happens once when writing
> around
> 256MB. This assumption may not be true, given all data can be in
> page
> cache before writing. So when two tasks are submitting IOs
> concurrently,
> IOs from each single task is sequential, and NCQ may order the
> current batch
> submitted from the two streams. However disk seeking may still be
> needed
> for the next batch handled by NCQ.
>
> > And of course, note that in real life, we are very *often* writing
> to
> > multiple files in parallel, for example, during a "make -j16"
> while
> > building the kernel. Writing a single large file is certainly
> > something people do (but even there people who are burning a 4G
> DVD
> > rip are often browsing the web while they are waiting for it to
> > complete, and the browser will be writing cache files, etc.). So
> > whether or not this is something where we should be stressing over
> > this specific workload is going to be quite debateable.
>
Hi,
is there any update on this? Sorry if I am making noise, but I would
like to help to improve the kernel (or fix it) if I can help.
Otherwise, please let me know how to consider this case,
Thanks, and bye
Andrea
Powered by blists - more mailing lists