lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Wed, 1 Jan 2020 21:53:31 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Hillf Danton <hdanton@...a.com>
Cc:     "Theodore Y. Ts'o" <tytso@....edu>,
        Andrea Vai <andrea.vai@...pv.it>,
        "Schmid, Carsten" <Carsten_Schmid@...tor.com>,
        Finn Thain <fthain@...egraphics.com.au>,
        Damien Le Moal <Damien.LeMoal@....com>,
        Alan Stern <stern@...land.harvard.edu>,
        Jens Axboe <axboe@...nel.dk>,
        Johannes Thumshirn <jthumshirn@...e.de>,
        USB list <linux-usb@...r.kernel.org>,
        SCSI development list <linux-scsi@...r.kernel.org>,
        Himanshu Madhani <himanshu.madhani@...ium.com>,
        Hannes Reinecke <hare@...e.com>,
        Omar Sandoval <osandov@...com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        Hans Holmberg <Hans.Holmberg@....com>,
        Kernel development list <linux-kernel@...r.kernel.org>,
        linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: slow IO on USB media

On Wed, Jan 01, 2020 at 03:43:10PM +0800, Hillf Danton wrote:
> 
> On Thu, 26 Dec 2019 16:37:06 +0800 Ming Lei wrote:
> > On Wed, Dec 25, 2019 at 10:30:57PM -0500, Theodore Y. Ts'o wrote:
> > > On Thu, Dec 26, 2019 at 10:27:02AM +0800, Ming Lei wrote:
> > > > Maybe we need to be careful for HDD., since the request count in scheduler
> > > > queue is double of in-flight request count, and in theory NCQ should only
> > > > cover all in-flight 32 requests. I will find a sata HDD., and see if
> > > > performance drop can be observed in the similar 'cp' test.
> > >
> > > Please try to measure it, but I'd be really surprised if it's
> > > significant with with modern HDD's.
> > 
> > Just find one machine with AHCI SATA, and run the following xfs
> > overwrite test:
> > 
> > #!/bin/bash
> > DIR=$1
> > echo 3 > /proc/sys/vm/drop_caches
> > fio --readwrite=write --filesize=5g --overwrite=1 --filename=$DIR/fiofile \
> >         --runtime=60s --time_based --ioengine=psync --direct=0 --bs=4k
> > 		--iodepth=128 --numjobs=2 --group_reporting=1 --name=overwrite
> > 
> > FS is xfs, and disk is LVM over AHCI SATA with NCQ(depth 32), because the
> > machine is picked up from RH beaker, and it is the only disk in the box.
> > 
> > #lsblk
> > NAME                            MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
> > sda                               8:0    0 931.5G  0 disk
> > =E2=94=9C=E2=94=80sda1                            8:1    0     1G  0 part /boot
> > =E2=94=94=E2=94=80sda2                            8:2    0 930.5G  0 part
> >   =E2=94=9C=E2=94=80rhel_hpe--ml10gen9--01-root 253:0    0    50G  0 lvm  /
> >   =E2=94=9C=E2=94=80rhel_hpe--ml10gen9--01-swap 253:1    0   3.9G  0 lvm  [SWAP]
> >   =E2=94=94=E2=94=80rhel_hpe--ml10gen9--01-home 253:2    0 876.6G  0 lvm  /home
> > 
> > 
> > kernel: 3a7ea2c483a53fc("scsi: provide mq_ops->busy() hook") which is
> > the previous commit of f664a3cc17b7 ("scsi: kill off the legacy IO path").
> > 
> >             |scsi_mod.use_blk_mq=N |scsi_mod.use_blk_mq=Y |
> > -----------------------------------------------------------
> > throughput: |244MB/s               |169MB/s               |
> > -----------------------------------------------------------
> > 
> > Similar result can be observed on v5.4 kernel(184MB/s) with same test
> > steps.
> 
> 
> The simple diff makes direct issue of requests take pending requests
> also into account and goes the nornal enqueue-and-dequeue path if any
> pending requests exist.
> 
> Then it sorts requests regardless of the number of hard queues in a
> bid to make requests as sequencial as they are. Let's see if the
> added sorting cost can make any sense.
> 
> --->8---
> 
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -410,6 +410,11 @@ run:
>  		blk_mq_run_hw_queue(hctx, async);
>  }
>  
> +static inline bool blk_mq_sched_hctx_has_pending_rq(struct blk_mq_hw_ctx *hctx)
> +{
> +	return sbitmap_any_bit_set(&hctx->ctx_map);
> +}
> +
>  void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
>  				  struct blk_mq_ctx *ctx,
>  				  struct list_head *list, bool run_queue_async)
> @@ -433,7 +438,8 @@ void blk_mq_sched_insert_requests(struct
>  		 * busy in case of 'none' scheduler, and this way may save
>  		 * us one extra enqueue & dequeue to sw queue.
>  		 */
> -		if (!hctx->dispatch_busy && !e && !run_queue_async) {
> +		if (!hctx->dispatch_busy && !e && !run_queue_async &&
> +		    !blk_mq_sched_hctx_has_pending_rq(hctx)) {
>  			blk_mq_try_issue_list_directly(hctx, list);
>  			if (list_empty(list))
>  				goto out;
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1692,7 +1692,7 @@ void blk_mq_flush_plug_list(struct blk_p
>  
>  	list_splice_init(&plug->mq_list, &list);
>  
> -	if (plug->rq_count > 2 && plug->multiple_queues)
> +	if (plug->rq_count > 1)
>  		list_sort(NULL, &list, plug_rq_cmp);
>  
>  	plug->rq_count = 0;

I guess you may not understand the reason, and the issue is related
with neither MQ nor plug.

AHCI/SATA is single queue drive, and for HDD. IO throughput is very
sensitive with IO order in case of sequential IO.

Legacy IO path supports ioc batching and BDI queue congestion. When
there are more than one writeback IO paths, there may be only one
active IO submission path, meantime others are blocked attributed to
ioc batching, so writeback IO is still dispatched to disk in strict
IO order.

But ioc batching and BDI queue congestion is killed when converting to
blk-mq.

Please see the following IO trace with legacy IO request path:

https://lore.kernel.org/linux-scsi/f82fd5129e3dcacae703a689be60b20a7fedadf6.camel@unipv.it/2-log_ming_20191128_182751.zip


Thanks,
Ming

Powered by blists - more mailing lists