linux-kernel - Re: 答复: [bug-report] Performance regression with fio sequential-write on a multipath setup.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <be93c772-d400-4105-b05a-735ed7365730@oracle.com>
Date: Tue, 12 Mar 2024 15:47:44 +0530
From: Harshit Mogalapalli <harshit.m.mogalapalli@...cle.com>
To: 牛志国 (Zhiguo Niu) <Zhiguo.Niu@...soc.com>,
        "bvanassche@....org" <bvanassche@....org>,
        Jens Axboe <axboe@...nel.dk>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
        Ramanan Govindarajan <ramanan.govindarajan@...cle.com>,
        Paul Webb <paul.x.webb@...cle.com>,
        "nicky.veitch@...cle.com" <nicky.veitch@...cle.com>,
        邢云龙 (Yunlong Xing) <Yunlong.Xing@...soc.com>,
        金红宇 (Hongyu Jin) <hongyu.jin@...soc.com>,
        Darren Kenny <darren.kenny@...cle.com>
Subject: Re: 答复: [bug-report] Performance regression with fio sequential-write on a multipath setup.

Hi Zhiguo,


On 07/03/24 08:25, 牛志国 (Zhiguo Niu) wrote:
> Hi Harshit Mogalapalli
> 
> What is the queue_depth of queue of your storage device?
> In the same test conditions, what are the the results of sequential reading?
> 

Thanks for the response.

Queue depth of the storage device is 254.
					
And here are sequential read data:

6.8-rc7: 2 block devices with multi-path:
----------------------------------------
Run status group 0 (all jobs):
    READ: bw=448MiB/s (470MB/s), 448MiB/s-448MiB/s (470MB/s-470MB/s), 
io=263GiB (282GB), run=600311-600311msec

Disk stats (read/write):
     dm-1: ios=418480/0, merge=642066/0, ticks=143492597/0, 
in_queue=143492597, util=98.28%, aggrios=287904/0, aggrmerge=0/0, 
aggrticks=71063414/0, aggrin_queue=71063414, aggrutil=86.71%
   sdf: ios=575809/0, merge=0/0, ticks=142126829/0, in_queue=142126829, 
util=86.71%
   sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
     dm-12: ios=422296/0, merge=667474/0, ticks=143680598/0, 
in_queue=143680598, util=98.95%, aggrios=288787/0, aggrmerge=0/0, 
aggrticks=71153453/0, aggrin_queue=71153453, aggrutil=86.72%
   sdae: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
   sdg: ios=577574/0, merge=0/0, ticks=142306906/0, in_queue=142306906, 
util=86.72%

Throughput Results:
READ:470:3582:0



6.8-rc7+ Revert : 2 block devices with multi-path:
-------------------------------------------------
Run status group 0 (all jobs):
    READ: bw=462MiB/s (484MB/s), 462MiB/s-462MiB/s (484MB/s-484MB/s), 
io=271GiB (291GB), run=600298-600298msec

Disk stats (read/write):
     dm-1: ios=421574/0, merge=692148/0, ticks=143444547/0, 
in_queue=143444547, util=99.19%, aggrios=288316/0, aggrmerge=0/0, 
aggrticks=71080370/0, aggrin_queue=71080370, aggrutil=87.08%
   sdf: ios=576633/0, merge=0/0, ticks=142160740/0, in_queue=142160740, 
util=87.08%
   sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
     dm-12: ios=432589/0, merge=672001/0, ticks=142976262/0, 
in_queue=142976262, util=99.03%, aggrios=293051/0, aggrmerge=0/0, 
aggrticks=70886007/0, aggrin_queue=70886007, aggrutil=87.03%
   sdae: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
   sdg: ios=586102/0, merge=0/0, ticks=141772015/0, in_queue=141772015, 
util=87.03%

Throughput Results:
READ:484:3695:0


On an average over 4 iterations:

on 6.8-rc7 : 3571
on 6.8-rc7 + revert : 3634

Almost there is no regression on sequential-read while there is a 
significant regression in sequential write


Thanks,
Harshit
> Thanks!
> -----邮件原件-----
> 发件人: Harshit Mogalapalli <harshit.m.mogalapalli@...cle.com>
> 发送时间: 2024年3月7日 2:46
> 收件人: 牛志国 (Zhiguo Niu) <Zhiguo.Niu@...soc.com>; bvanassche@....org; Jens Axboe <axboe@...nel.dk>; linux-block@...r.kernel.org
> 抄送: LKML <linux-kernel@...r.kernel.org>; Ramanan Govindarajan <ramanan.govindarajan@...cle.com>; Paul Webb <paul.x.webb@...cle.com>; nicky.veitch@...cle.com
> 主题: [bug-report] Performance regression with fio sequential-write on a multipath setup.
> 
> 
> 注意: 这封邮件来自于外部。除非你确定邮件内容安全，否则不要点击任何链接和附件。
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> 
> 
> 
> Hi,
> 
> We have noticed a performance regression in kernel with fio sequential write job.
> 
> Notes and observations:
> ======================
> 1. This is observed on recent kernels(6.6) when compared with 5.15.y, the bisection points to commit d47f9717e5cf ("block/mq-deadline: use correct way to throttling write requests") 2. Reverting the above commit improves the performance.
> 3. This regression can also be seen on 6.8-rc7 and a revert on top of that fixes the regression.
> 4. The commit looks very much related to the cause of regression.
> 5. Note that this happens only with multi-path setup even with 2 block devices.
> 
> Test details:
> ============
> (A) fio.write job
> 
> fio-3.19 -- fio version
> 
> [global]
> ioengine=libaio
> rw=write
> bs=128k
> iodepth=64
> numjobs=24
> direct=1
> fsync=1
> runtime=600
> group_reporting
> 
> [job]
> filename=/dev/dm-0
> [job]
> filename=/dev/dm-1
> 
> Each disk is of 600G size.
> 
> (B) Test results
> 
> 6.8-rc7: 2 block devices with multi-path
> -------
> 
> job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ...
> job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ...
> fio-3.19
> Starting 48 processes
> 
> job: (groupid=0, jobs=48): err= 0: pid=6164: Wed Mar  6 17:58:33 2024
>     write: IOPS=1884, BW=236MiB/s (247MB/s)(138GiB/600319msec); 0 zone resets
>       slat (usec): min=2, max=540462, avg=25445.35, stdev=24181.85
>       clat (msec): min=9, max=4941, avg=1602.56, stdev=339.05
>        lat (msec): min=9, max=4973, avg=1628.00, stdev=342.19
>       clat percentiles (msec):
>        |  1.00th=[  986],  5.00th=[ 1167], 10.00th=[ 1250], 20.00th=[ 1368],
>        | 30.00th=[ 1435], 40.00th=[ 1502], 50.00th=[ 1569], 60.00th=[ 1636],
>        | 70.00th=[ 1703], 80.00th=[ 1804], 90.00th=[ 1955], 95.00th=[ 2140],
>        | 99.00th=[ 2869], 99.50th=[ 3239], 99.90th=[ 3842], 99.95th=[ 4010],
>        | 99.99th=[ 4329]
>      bw (  KiB/s): min=47229, max=516492, per=100.00%, avg=241546.47, stdev=1326.92, samples=57259
>      iops        : min=  322, max= 3996, avg=1843.17, stdev=10.39,
> samples=57259
>     lat (msec)   : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, 250=0.02%
>     lat (msec)   : 500=0.06%, 750=0.14%, 1000=0.93%, 2000=90.41%,
>   >=2000=8.42%
>     fsync/fdatasync/sync_file_range:
>       sync (nsec): min=10, max=57940, avg=104.23, stdev=498.86
>       sync percentiles (nsec):
>        |  1.00th=[   13],  5.00th=[   19], 10.00th=[   26], 20.00th=[   61],
>        | 30.00th=[   68], 40.00th=[   72], 50.00th=[   75], 60.00th=[   78],
>        | 70.00th=[   87], 80.00th=[  167], 90.00th=[  175], 95.00th=[  177],
>        | 99.00th=[  221], 99.50th=[  231], 99.90th=[  318], 99.95th=[15680],
>        | 99.99th=[17792]
>     cpu          : usr=0.08%, sys=0.16%, ctx=1096948, majf=0, minf=1995
>     IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
>   >=64=199.5%
>        submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>   >=64=0.0%
>        complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,  >=64=0.0%
>        issued rwts: total=0,1131018,0,1127994 short=0,0,0,0 dropped=0,0,0,0
>        latency   : target=0, window=0, percentile=100.00%, depth=64
> 
> Run status group 0 (all jobs):
>     WRITE: bw=236MiB/s (247MB/s), 236MiB/s-236MiB/s (247MB/s-247MB/s), io=138GiB (148GB), run=600319-600319msec
> 
> Disk stats (read/write):
>       dm-0: ios=50/533034, merge=0/27056, ticks=16/113070163, in_queue=113070180, util=100.00%, aggrios=43/266595, aggrmerge=0/0, aggrticks=156/56542549, aggrin_queue=56542706, aggrutil=100.00%
>     sdac: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>     sde: ios=86/533191, merge=0/0, ticks=313/113085099, in_queue=113085413, util=100.00%
>       dm-1: ios=5/534381, merge=0/36389, ticks=240/113110344, in_queue=113110584, util=100.00%, aggrios=7/267191, aggrmerge=0/0, aggrticks=153/56543654, aggrin_queue=56543807, aggrutil=100.00%
>     sdf: ios=14/534382, merge=0/0, ticks=306/113087308, in_queue=113087615, util=100.00%
>     sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
> 
> Throughput Results:
> WRITE:247:1884:0
> 
> 
> 6.8-rc7+ Revert : 2 block devices with multi-path
> -------
> 
> job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ...
> job: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=64 ...
> fio-3.19
> Starting 48 processes
> 
> job: (groupid=0, jobs=48): err= 0: pid=6104: Wed Mar  6 18:29:13 2024
>     write: IOPS=2518, BW=315MiB/s (330MB/s)(185GiB/600339msec); 0 zone resets
>       slat (usec): min=2, max=923472, avg=6789.22, stdev=20329.20
>       clat (msec): min=4, max=6020, avg=1212.68, stdev=714.90
>        lat (msec): min=4, max=6020, avg=1219.47, stdev=718.40
>       clat percentiles (msec):
>        |  1.00th=[  203],  5.00th=[  309], 10.00th=[  384], 20.00th=[  535],
>        | 30.00th=[  709], 40.00th=[  911], 50.00th=[ 1133], 60.00th=[ 1334],
>        | 70.00th=[ 1519], 80.00th=[ 1754], 90.00th=[ 2198], 95.00th=[ 2601],
>        | 99.00th=[ 3171], 99.50th=[ 3608], 99.90th=[ 4329], 99.95th=[ 4597],
>        | 99.99th=[ 5134]
>      bw (  KiB/s): min=12237, max=1834896, per=100.00%, avg=413187.52, stdev=6322.04, samples=44948
>      iops        : min=   48, max=14314, avg=3186.68, stdev=49.49,
> samples=44948
>     lat (msec)   : 10=0.01%, 20=0.01%, 50=0.09%, 100=0.02%, 250=2.28%
>     lat (msec)   : 500=15.45%, 750=14.26%, 1000=11.83%, 2000=42.52%,
>   >=2000=13.55%
>     fsync/fdatasync/sync_file_range:
>       sync (nsec): min=10, max=76066, avg=57.85, stdev=299.52
>       sync percentiles (nsec):
>        |  1.00th=[   13],  5.00th=[   14], 10.00th=[   15], 20.00th=[   16],
>        | 30.00th=[   17], 40.00th=[   20], 50.00th=[   28], 60.00th=[   47],
>        | 70.00th=[   65], 80.00th=[   80], 90.00th=[  103], 95.00th=[  175],
>        | 99.00th=[  237], 99.50th=[  241], 99.90th=[  262], 99.95th=[  318],
>        | 99.99th=[16512]
>     cpu          : usr=0.06%, sys=0.07%, ctx=531434, majf=0, minf=728
>     IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
>   >=64=199.6%
>        submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>   >=64=0.0%
>        complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,  >=64=0.0%
>        issued rwts: total=0,1511918,0,1508894 short=0,0,0,0 dropped=0,0,0,0
>        latency   : target=0, window=0, percentile=100.00%, depth=64
> 
> Run status group 0 (all jobs):
>     WRITE: bw=315MiB/s (330MB/s), 315MiB/s-315MiB/s (330MB/s-330MB/s), io=185GiB (198GB), run=600339-600339msec
> 
> Disk stats (read/write):
>       dm-0: ios=0/246318, merge=0/493981, ticks=0/142584585, in_queue=142584586, util=99.17%, aggrios=6/181454, aggrmerge=0/0, aggrticks=112/70608689, aggrin_queue=70608801, aggrutil=84.92%
>     sdac: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>     sde: ios=12/362908, merge=0/0, ticks=224/141217379, in_queue=141217603, util=84.92%
>       dm-1: ios=0/233211, merge=0/538097, ticks=0/142579042, in_queue=142579043, util=99.15%, aggrios=8/174475, aggrmerge=0/0, aggrticks=128/70654686, aggrin_queue=70654814, aggrutil=85.20%
>     sdf: ios=16/348951, merge=0/0, ticks=256/141309372, in_queue=141309628, util=85.20%
>     sdad: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
> 
> Throughput Results:
> WRITE:330:2518:0
> 
> (C) performance difference:
> 
> That is roughly a 33.65% performance change, this is reproducible on higher number of block devices as well.
> 
> 
> 
> Thanks to Paul Webb for identifying this regression and sharing the details.
> We will be happy to test any patches to check the change in performance and also follow any suggestions.
> 
> 
> Thanks,
> Harshit