lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180621011656.GA15427@ming.t460p>
Date:   Thu, 21 Jun 2018 09:17:05 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Gi-Oh Kim <gi-oh.kim@...fitbricks.com>
Cc:     Jens Axboe <axboe@...com>, hch@...radead.org,
        Al Viro <viro@...iv.linux.org.uk>, kent.overstreet@...il.com,
        dsterba@...e.cz, ying.huang@...el.com,
        linux-kernel@...r.kernel.org, linux-block@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org, tytso@....edu,
        darrick.wong@...cle.com, colyli@...e.de, fdmanana@...il.com,
        rdunlap@...radead.org
Subject: Re: [PATCH V6 00/30] block: support multipage bvec

On Fri, Jun 15, 2018 at 02:59:19PM +0200, Gi-Oh Kim wrote:
> >
> > - bio size can be increased and it should improve some high-bandwidth IO
> > case in theory[4].
> >
> 
> Hi,
> 
> I would like to report your patch set works well on my system based on v4.14.48.
> I thought the multipage bvec could improve the performance of my system.
> (FYI, my system has v4.14.48 and provides KVM-base virtualization service.)

Thanks for your test!

> 
> So I did back-porting your patches to v4.14.48.
> It has done without any serious problem.
> I only needed to cherry-pick "blk-merge: compute
> bio->bi_seg_front_size efficiently" and
> "block: move bio_alloc_pages() to bcache" patches before back-porting
> to prevent conflicts.

Not sure I understand your point, you have to backport all patches.

> And I ran my own test-suit for checking features of md and RAID1 layer.
> There was no problem. All test cases passed.
> (If you want, I will send you the back-ported patches.)
> 
> Then I did two performance test as following.
> To say the conclusion first, I failed to show performance improvement
> of the patch set.
> Of course, my test cases would not be suitable to test your patch set.
> Or maybe I did test wrong.
> Please inform me which tools are suitable, then I will try them.
> 
> 1. fio
> 
> First I ran fio with null device to check the performance of the block-layer.
> I am not sure those test is suitable to show the performance
> improvement or degradation.
> Nevertheless there was a little (-6%) performance degradation.
> 
> If it is not much trouble to you, please review my options for fio and
> inform me if I used wrong or incorrect options.
> Then I will run the test again.
> 
> 1.1 Following is my options for fio.
> 
> gkim@ib1:~/pb-ltp/benchmark/fio$ cat go_local.sh
> #!/bin/bash
> echo "fio start   : $(date)"
> echo "kernel info : $(uname -a)"
> echo "fio version : $(fio --version)"
> 
> # set "none" io-scheduler
> modprobe -r null_blk
> modprobe null_blk
> echo "none" > /sys/block/nullb0/queue/scheduler
> 
> FIO_OPTION="--direct=1 --rw=randrw:2 --time_based=1 --group_reporting \
>             --ioengine=libaio --iodepth=64 --name=fiotest --numjobs=8 \
>             --bssplit=512/20:1k/16:2k/9:4k/12:8k/19:16k/10:32k/8:64k/4 \
>             --fadvise_hint=0 --iodepth_batch_submit=64
> --iodepth_batch_complete=64"
> # fio test null_blk device, so it is not necessary to run long.
> fio $FIO_OPTION --filename=/dev/nullb0 --runtime=600
> 
> 1.2 Following is the result before porting.
> 
> fio start   : Mon Jun 11 04:30:01 CEST 2018
> kernel info : Linux ib1 4.14.48-1-pserver
> #4.14.48-1.1+feature+daily+update+20180607.0857+1bbde0b~deb8 SMP
> x86_64 GNU/Linux
> fio version : fio-2.2.10
> fiotest: (g=0): rw=randrw, bs=512-64K/512-64K/512-64K,
> ioengine=libaio, iodepth=64
> ...
> fio-2.2.10
> Starting 8 processes
> 
> fiotest: (groupid=0, jobs=8): err= 0: pid=1655: Mon Jun 11 04:40:02 2018
>   read : io=7133.2GB, bw=12174MB/s, iops=1342.1K, runt=600001msec
>     slat (usec): min=1, max=15750, avg=123.78, stdev=153.79
>     clat (usec): min=0, max=15758, avg=24.70, stdev=77.93
>      lat (usec): min=2, max=15782, avg=148.49, stdev=167.54
>     clat percentiles (usec):
>      |  1.00th=[    0],  5.00th=[    1], 10.00th=[    1], 20.00th=[    1],
>      | 30.00th=[    2], 40.00th=[    2], 50.00th=[    2], 60.00th=[    6],
>      | 70.00th=[   22], 80.00th=[   36], 90.00th=[   72], 95.00th=[  107],
>      | 99.00th=[  173], 99.50th=[  203], 99.90th=[  932], 99.95th=[ 1416],
>      | 99.99th=[ 2960]
>     bw (MB  /s): min= 1096, max= 2147, per=12.51%, avg=1522.69, stdev=253.89
>   write: io=7131.3GB, bw=12171MB/s, iops=1343.6K, runt=600001msec
>     slat (usec): min=1, max=15751, avg=124.73, stdev=154.11
>     clat (usec): min=0, max=15758, avg=24.69, stdev=77.84
>      lat (usec): min=2, max=15780, avg=149.43, stdev=167.82
>     clat percentiles (usec):
>      |  1.00th=[    0],  5.00th=[    1], 10.00th=[    1], 20.00th=[    1],
>      | 30.00th=[    2], 40.00th=[    2], 50.00th=[    2], 60.00th=[    6],
>      | 70.00th=[   22], 80.00th=[   36], 90.00th=[   72], 95.00th=[  107],
>      | 99.00th=[  173], 99.50th=[  203], 99.90th=[  932], 99.95th=[ 1416],
>      | 99.99th=[ 2960]
>     bw (MB  /s): min= 1080, max= 2121, per=12.51%, avg=1522.33, stdev=253.96
>     lat (usec) : 2=21.63%, 4=37.80%, 10=2.12%, 20=6.43%, 50=16.70%
>     lat (usec) : 100=8.86%, 250=6.07%, 500=0.17%, 750=0.08%, 1000=0.05%
>     lat (msec) : 2=0.06%, 4=0.02%, 10=0.01%, 20=0.01%
>   cpu          : usr=22.39%, sys=64.19%, ctx=15425825, majf=0, minf=97
>   IO depths    : 1=1.8%, 2=1.8%, 4=8.8%, 8=14.4%, 16=12.3%, 32=41.7%, >=64=19.3%
>      submit    : 0=0.0%, 4=5.8%, 8=9.7%, 16=15.0%, 32=18.0%, 64=51.5%, >=64=0.0%
>      complete  : 0=0.0%, 4=0.1%, 8=0.0%, 16=0.1%, 32=0.1%, 64=100.0%, >=64=0.0%
>      issued    : total=r=805764385/w=806127393/d=0, short=r=0/w=0/d=0,
> drop=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=64
> 
> Run status group 0 (all jobs):
>    READ: io=7133.2GB, aggrb=12174MB/s, minb=12174MB/s, maxb=12174MB/s,
> mint=600001msec, maxt=600001msec
>   WRITE: io=7131.3GB, aggrb=12171MB/s, minb=12171MB/s, maxb=12171MB/s,
> mint=600001msec, maxt=600001msec
> 
> Disk stats (read/write):
>   nullb0: ios=442461761/442546060, merge=363197836/363473703,
> ticks=12280990/12452480, in_queue=2740, util=0.43%
> 
> 1.3 Following is the result after porting.
> 
> fio start   : Fri Jun 15 12:42:47 CEST 2018
> kernel info : Linux ib1 4.14.48-1-pserver-mpbvec+ #12 SMP Fri Jun 15
> 12:21:36 CEST 2018 x86_64 GNU/Linux
> fio version : fio-2.2.10
> fiotest: (g=0): rw=randrw, bs=512-64K/512-64K/512-64K,
> ioengine=libaio, iodepth=64
> ...
> fio-2.2.10
> Starting 8 processes
> Jobs: 4 (f=0): [m(1),_(2),m(1),_(1),m(2),_(1)] [100.0% done]
> [8430MB/8444MB/0KB /s] [961K/963K/0 iops] [eta 00m:00s]
> fiotest: (groupid=0, jobs=8): err= 0: pid=14096: Fri Jun 15 12:52:48 2018
>   read : io=6633.8GB, bw=11322MB/s, iops=1246.9K, runt=600005msec
>     slat (usec): min=1, max=16939, avg=135.34, stdev=156.23
>     clat (usec): min=0, max=16947, avg=26.10, stdev=78.50
>      lat (usec): min=2, max=16957, avg=161.45, stdev=168.88
>     clat percentiles (usec):
>      |  1.00th=[    0],  5.00th=[    1], 10.00th=[    1], 20.00th=[    1],
>      | 30.00th=[    2], 40.00th=[    2], 50.00th=[    2], 60.00th=[    5],
>      | 70.00th=[   23], 80.00th=[   37], 90.00th=[   79], 95.00th=[  115],
>      | 99.00th=[  181], 99.50th=[  211], 99.90th=[  948], 99.95th=[ 1416],
>      | 99.99th=[ 2864]
>     bw (MB  /s): min= 1106, max= 2031, per=12.51%, avg=1416.05, stdev=201.81
>   write: io=6631.1GB, bw=11318MB/s, iops=1247.5K, runt=600005msec
>     slat (usec): min=1, max=16938, avg=136.48, stdev=156.54
>     clat (usec): min=0, max=16947, avg=26.08, stdev=78.43
>      lat (usec): min=2, max=16957, avg=162.58, stdev=169.15
>     clat percentiles (usec):
>      |  1.00th=[    0],  5.00th=[    1], 10.00th=[    1], 20.00th=[    1],
>      | 30.00th=[    2], 40.00th=[    2], 50.00th=[    2], 60.00th=[    5],
>      | 70.00th=[   23], 80.00th=[   37], 90.00th=[   79], 95.00th=[  115],
>      | 99.00th=[  181], 99.50th=[  211], 99.90th=[  948], 99.95th=[ 1416],
>      | 99.99th=[ 2864]
>     bw (MB  /s): min= 1084, max= 2044, per=12.51%, avg=1415.67, stdev=201.93
>     lat (usec) : 2=20.98%, 4=38.82%, 10=2.15%, 20=5.08%, 50=16.91%
>     lat (usec) : 100=8.75%, 250=6.91%, 500=0.19%, 750=0.09%, 1000=0.05%
>     lat (msec) : 2=0.07%, 4=0.02%, 10=0.01%, 20=0.01%
>   cpu          : usr=21.02%, sys=65.53%, ctx=15321661, majf=0, minf=78
>   IO depths    : 1=1.9%, 2=1.9%, 4=9.5%, 8=13.6%, 16=11.2%, 32=42.1%, >=64=19.9%
>      submit    : 0=0.0%, 4=6.3%, 8=10.1%, 16=14.1%, 32=18.2%,
> 64=51.3%, >=64=0.0%
>      complete  : 0=0.0%, 4=0.1%, 8=0.0%, 16=0.1%, 32=0.1%, 64=100.0%, >=64=0.0%
>      issued    : total=r=748120019/w=748454509/d=0, short=r=0/w=0/d=0,
> drop=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=64
> 
> Run status group 0 (all jobs):
>    READ: io=6633.8GB, aggrb=11322MB/s, minb=11322MB/s, maxb=11322MB/s,
> mint=600005msec, maxt=600005msec
>   WRITE: io=6631.1GB, aggrb=11318MB/s, minb=11318MB/s, maxb=11318MB/s,
> mint=600005msec, maxt=600005msec
> 
> Disk stats (read/write):
>   nullb0: ios=410911387/410974086, merge=337127604/337396176,
> ticks=12482050/12662790, in_queue=1780, util=0.27%
> 
> 
> 2. Unixbench
> 
> Second I rand Unixbench to check general performance.
> I think there is no difference before and after porting the patches.
> Unixbench might not be suitable to check the performance improvement
> of the block layer.
> If you inform me which tools is suitable, I will try it on my system.
> 
> 2.1 Following is the result before porting.
> 
>    BYTE UNIX Benchmarks (Version 5.1.3)
> 
>    System: ib1: GNU/Linux
>    OS: GNU/Linux -- 4.14.48-1-pserver --
> #4.14.48-1.1+feature+daily+update+20180607.0857+1bbde0b~deb8 SMP
>    Machine: x86_64 (unknown)
>    Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
>    CPU 0: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz (7008.0 bogomips)
>           Hyper-Threading, x86-64, MMX, Physical Address Ext,
> SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
>    CPU 1: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz (7008.0 bogomips)
>           Hyper-Threading, x86-64, MMX, Physical Address Ext,
> SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
>    CPU 2: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz (7008.0 bogomips)
>           Hyper-Threading, x86-64, MMX, Physical Address Ext,
> SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
>    CPU 3: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz (7008.0 bogomips)
>           Hyper-Threading, x86-64, MMX, Physical Address Ext,
> SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
>    CPU 4: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz (7008.0 bogomips)
>           Hyper-Threading, x86-64, MMX, Physical Address Ext,
> SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
>    CPU 5: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz (7008.0 bogomips)
>           Hyper-Threading, x86-64, MMX, Physical Address Ext,
> SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
>    CPU 6: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz (7008.0 bogomips)
>           Hyper-Threading, x86-64, MMX, Physical Address Ext,
> SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
>    CPU 7: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz (7008.0 bogomips)
>           Hyper-Threading, x86-64, MMX, Physical Address Ext,
> SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
>    05:00:01 up 3 days, 16:20,  2 users,  load average: 0.00, 0.11,
> 1.11; runlevel 2018-06-07
> 
> ------------------------------------------------------------------------
> Benchmark Run: Mon Jun 11 2018 05:00:01 - 05:28:54
> 8 CPUs in system; running 1 parallel copy of tests
> 
> Dhrystone 2 using register variables       47158867.7 lps   (10.0 s, 7 samples)
> Double-Precision Whetstone                     3878.8 MWIPS (15.2 s, 7 samples)
> Execl Throughput                               9203.9 lps   (30.0 s, 2 samples)
> File Copy 1024 bufsize 2000 maxblocks       1490834.8 KBps  (30.0 s, 2 samples)
> File Copy 256 bufsize 500 maxblocks          388784.2 KBps  (30.0 s, 2 samples)
> File Copy 4096 bufsize 8000 maxblocks       3744780.2 KBps  (30.0 s, 2 samples)
> Pipe Throughput                             2682620.1 lps   (10.0 s, 7 samples)
> Pipe-based Context Switching                 263786.5 lps   (10.0 s, 7 samples)
> Process Creation                              19674.0 lps   (30.0 s, 2 samples)
> Shell Scripts (1 concurrent)                  16121.5 lpm   (60.0 s, 2 samples)
> Shell Scripts (8 concurrent)                   5623.5 lpm   (60.0 s, 2 samples)
> System Call Overhead                        4068991.3 lps   (10.0 s, 7 samples)
> 
> System Benchmarks Index Values               BASELINE       RESULT    INDEX
> Dhrystone 2 using register variables         116700.0   47158867.7   4041.0
> Double-Precision Whetstone                       55.0       3878.8    705.2
> Execl Throughput                                 43.0       9203.9   2140.4
> File Copy 1024 bufsize 2000 maxblocks          3960.0    1490834.8   3764.7
> File Copy 256 bufsize 500 maxblocks            1655.0     388784.2   2349.1
> File Copy 4096 bufsize 8000 maxblocks          5800.0    3744780.2   6456.5
> Pipe Throughput                               12440.0    2682620.1   2156.4
> Pipe-based Context Switching                   4000.0     263786.5    659.5
> Process Creation                                126.0      19674.0   1561.4
> Shell Scripts (1 concurrent)                     42.4      16121.5   3802.2
> Shell Scripts (8 concurrent)                      6.0       5623.5   9372.5
> System Call Overhead                          15000.0    4068991.3   2712.7
>                                                                    ========
> System Benchmarks Index Score                                        2547.7
> 
> ------------------------------------------------------------------------
> Benchmark Run: Mon Jun 11 2018 05:28:54 - 05:57:07
> 8 CPUs in system; running 8 parallel copies of tests
> 
> Dhrystone 2 using register variables      234727639.9 lps   (10.0 s, 7 samples)
> Double-Precision Whetstone                    35350.9 MWIPS (10.7 s, 7 samples)
> Execl Throughput                              43811.3 lps   (30.0 s, 2 samples)
> File Copy 1024 bufsize 2000 maxblocks       1401373.1 KBps  (30.0 s, 2 samples)
> File Copy 256 bufsize 500 maxblocks          366033.9 KBps  (30.0 s, 2 samples)
> File Copy 4096 bufsize 8000 maxblocks       4360829.6 KBps  (30.0 s, 2 samples)
> Pipe Throughput                            12875165.6 lps   (10.0 s, 7 samples)
> Pipe-based Context Switching                2431725.6 lps   (10.0 s, 7 samples)
> Process Creation                              97360.8 lps   (30.0 s, 2 samples)
> Shell Scripts (1 concurrent)                  58879.6 lpm   (60.0 s, 2 samples)
> Shell Scripts (8 concurrent)                   9232.5 lpm   (60.0 s, 2 samples)
> System Call Overhead                        9497958.7 lps   (10.0 s, 7 samples)
> 
> System Benchmarks Index Values               BASELINE       RESULT    INDEX
> Dhrystone 2 using register variables         116700.0  234727639.9  20113.8
> Double-Precision Whetstone                       55.0      35350.9   6427.4
> Execl Throughput                                 43.0      43811.3  10188.7
> File Copy 1024 bufsize 2000 maxblocks          3960.0    1401373.1   3538.8
> File Copy 256 bufsize 500 maxblocks            1655.0     366033.9   2211.7
> File Copy 4096 bufsize 8000 maxblocks          5800.0    4360829.6   7518.7
> Pipe Throughput                               12440.0   12875165.6  10349.8
> Pipe-based Context Switching                   4000.0    2431725.6   6079.3
> Process Creation                                126.0      97360.8   7727.0
> Shell Scripts (1 concurrent)                     42.4      58879.6  13886.7
> Shell Scripts (8 concurrent)                      6.0       9232.5  15387.5
> System Call Overhead                          15000.0    9497958.7   6332.0
>                                                                    ========
> System Benchmarks Index Score                                        7803.5
> 
> 
> 2.2 Following is the result after porting.
> 
>    BYTE UNIX Benchmarks (Version 5.1.3)
> 
>    System: ib1: GNU/Linux
>    OS: GNU/Linux -- 4.14.48-1-pserver-mpbvec+ -- #12 SMP Fri Jun 15
> 12:21:36 CEST 2018
>    Machine: x86_64 (unknown)
>    Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
>    CPU 0: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz (7008.0 bogomips)
>           Hyper-Threading, x86-64, MMX, Physical Address Ext,
> SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
>    CPU 1: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz (7008.0 bogomips)
>           Hyper-Threading, x86-64, MMX, Physical Address Ext,
> SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
>    CPU 2: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz (7008.0 bogomips)
>           Hyper-Threading, x86-64, MMX, Physical Address Ext,
> SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
>    CPU 3: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz (7008.0 bogomips)
>           Hyper-Threading, x86-64, MMX, Physical Address Ext,
> SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
>    CPU 4: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz (7008.0 bogomips)
>           Hyper-Threading, x86-64, MMX, Physical Address Ext,
> SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
>    CPU 5: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz (7008.0 bogomips)
>           Hyper-Threading, x86-64, MMX, Physical Address Ext,
> SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
>    CPU 6: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz (7008.0 bogomips)
>           Hyper-Threading, x86-64, MMX, Physical Address Ext,
> SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
>    CPU 7: Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz (7008.0 bogomips)
>           Hyper-Threading, x86-64, MMX, Physical Address Ext,
> SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
>    13:16:11 up 50 min,  1 user,  load average: 0.00, 1.40, 3.46;
> runlevel 2018-06-15
> 
> ------------------------------------------------------------------------
> Benchmark Run: Fri Jun 15 2018 13:16:11 - 13:45:04
> 8 CPUs in system; running 1 parallel copy of tests
> 
> Dhrystone 2 using register variables       47103754.6 lps   (10.0 s, 7 samples)
> Double-Precision Whetstone                     3886.3 MWIPS (15.1 s, 7 samples)
> Execl Throughput                               8965.0 lps   (30.0 s, 2 samples)
> File Copy 1024 bufsize 2000 maxblocks       1510285.9 KBps  (30.0 s, 2 samples)
> File Copy 256 bufsize 500 maxblocks          395196.9 KBps  (30.0 s, 2 samples)
> File Copy 4096 bufsize 8000 maxblocks       3802788.0 KBps  (30.0 s, 2 samples)
> Pipe Throughput                             2670169.1 lps   (10.0 s, 7 samples)
> Pipe-based Context Switching                 275093.8 lps   (10.0 s, 7 samples)
> Process Creation                              19707.1 lps   (30.0 s, 2 samples)
> Shell Scripts (1 concurrent)                  16046.8 lpm   (60.0 s, 2 samples)
> Shell Scripts (8 concurrent)                   5600.8 lpm   (60.0 s, 2 samples)
> System Call Overhead                        4104142.0 lps   (10.0 s, 7 samples)
> 
> System Benchmarks Index Values               BASELINE       RESULT    INDEX
> Dhrystone 2 using register variables         116700.0   47103754.6   4036.3
> Double-Precision Whetstone                       55.0       3886.3    706.6
> Execl Throughput                                 43.0       8965.0   2084.9
> File Copy 1024 bufsize 2000 maxblocks          3960.0    1510285.9   3813.9
> File Copy 256 bufsize 500 maxblocks            1655.0     395196.9   2387.9
> File Copy 4096 bufsize 8000 maxblocks          5800.0    3802788.0   6556.5
> Pipe Throughput                               12440.0    2670169.1   2146.4
> Pipe-based Context Switching                   4000.0     275093.8    687.7
> Process Creation                                126.0      19707.1   1564.1
> Shell Scripts (1 concurrent)                     42.4      16046.8   3784.6
> Shell Scripts (8 concurrent)                      6.0       5600.8   9334.6
> System Call Overhead                          15000.0    4104142.0   2736.1
>                                                                    ========
> System Benchmarks Index Score                                        2560.0
> 
> ------------------------------------------------------------------------
> Benchmark Run: Fri Jun 15 2018 13:45:04 - 14:13:17
> 8 CPUs in system; running 8 parallel copies of tests
> 
> Dhrystone 2 using register variables      237271982.6 lps   (10.0 s, 7 samples)
> Double-Precision Whetstone                    35186.8 MWIPS (10.7 s, 7 samples)
> Execl Throughput                              42557.8 lps   (30.0 s, 2 samples)
> File Copy 1024 bufsize 2000 maxblocks       1403922.0 KBps  (30.0 s, 2 samples)
> File Copy 256 bufsize 500 maxblocks          367436.5 KBps  (30.0 s, 2 samples)
> File Copy 4096 bufsize 8000 maxblocks       4380468.3 KBps  (30.0 s, 2 samples)
> Pipe Throughput                            12872664.6 lps   (10.0 s, 7 samples)
> Pipe-based Context Switching                2451404.5 lps   (10.0 s, 7 samples)
> Process Creation                              97788.2 lps   (30.0 s, 2 samples)
> Shell Scripts (1 concurrent)                  58505.9 lpm   (60.0 s, 2 samples)
> Shell Scripts (8 concurrent)                   9195.4 lpm   (60.0 s, 2 samples)
> System Call Overhead                        9467372.2 lps   (10.0 s, 7 samples)
> 
> System Benchmarks Index Values               BASELINE       RESULT    INDEX
> Dhrystone 2 using register variables         116700.0  237271982.6  20331.8
> Double-Precision Whetstone                       55.0      35186.8   6397.6
> Execl Throughput                                 43.0      42557.8   9897.2
> File Copy 1024 bufsize 2000 maxblocks          3960.0    1403922.0   3545.3
> File Copy 256 bufsize 500 maxblocks            1655.0     367436.5   2220.2
> File Copy 4096 bufsize 8000 maxblocks          5800.0    4380468.3   7552.5
> Pipe Throughput                               12440.0   12872664.6  10347.8
> Pipe-based Context Switching                   4000.0    2451404.5   6128.5
> Process Creation                                126.0      97788.2   7761.0
> Shell Scripts (1 concurrent)                     42.4      58505.9  13798.6
> Shell Scripts (8 concurrent)                      6.0       9195.4  15325.6
> System Call Overhead                          15000.0    9467372.2   6311.6
>                                                                    ========
> System Benchmarks Index Score                                        7794.3

At least now, BIO_MAX_PAGES can be fixed as 256 in case of CONFIG_THP_SWAP,
otherwise 2 pages may be allocated for holding the bvec table, so tests
in case of THP_SWAP may be improved.

Also filesystem may support IO to/from THP, and multipage bvec should
improve this case too.

Long term, there is opportunity to improve fs code by only allocating
'nr_segment' of bvec table, instead of 'nr_page' of bvec table because
physically contiguous pages are often allocated from mm for same
process.

So this patchset is just a start, and at the current stage, I am
focusing on making it stable since it is the correct approach to
only store the multipage segment instead of each pages.

Thanks again for your test.

Thanks,
Ming

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ