lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120301194735.GD32588@thunk.org>
Date:	Thu, 1 Mar 2012 14:47:35 -0500
From:	Ted Ts'o <tytso@....edu>
To:	Xupeng Yun <xupeng@...eng.me>
Cc:	Ext4 development <linux-ext4@...r.kernel.org>
Subject: Re: Bad performance of ext4 with kernel 3.0.17

Two things I'd try:

#1) If this is a freshly created file system, the kernel may be
initializing the inode table in the background, and this could be
interfering with your benchmark workload.  To address this, you can
either (a) add the mount option noinititable, (b) add the mke2fs
option "-E lazy_itable_init=0" --- but this will cause the mke2fs to
take a lot longer, or (c) mount the file system and wait until
"dumpe2fs /dev/md3 | tail" shows that the last block group has the
ITABLE_ZEROED flag set.  For benchmarking purposes on a scratch
workload, option (a) above is the fast thing to do.

#2) It could be that the file system is choosing blocks farther away
from the beginning of the disk, which is slower, whereas the fio on
the raw disk will use the blocks closest to the beginning of the disk,
which are the fastest one.  You could try creating the file system so
it is only 10GB, and then try running fio on that small, truncated
file system, and see if that makes a difference.

     	     	     	     	   - Ted


On Thu, Mar 01, 2012 at 01:31:58PM +0800, Xupeng Yun wrote:
> I just set up a new server (Gentoo 64bit with kernel 3.0.17) with 4 x
> 15000RPM SAS disks(sdc, sdd, sde and sdf), and created soft RAID10 on
> top of them, the partitions are aligned at 1MB:
> 
>     # fdisk -lu /dev/sd{c,e,d,f}
> 
>     Disk /dev/sdc: 600.1 GB, 600127266816 bytes
>     255 heads, 63 sectors/track, 72961 cylinders, total 1172123568 sectors
>     Units = sectors of 1 * 512 = 512 bytes
>     Sector size (logical/physical): 512 bytes / 512 bytes
>     I/O size (minimum/optimal): 512 bytes / 512 bytes
>     Disk identifier: 0xdd96eace
> 
>        Device Boot      Start         End      Blocks   Id  System
>     /dev/sdc1            2048  1172123567   586060760   fd  Linux raid
> autodetect
> 
>     Disk /dev/sde: 600.1 GB, 600127266816 bytes
>     3 heads, 63 sectors/track, 6201712 cylinders, total 1172123568 sectors
>     Units = sectors of 1 * 512 = 512 bytes
>     Sector size (logical/physical): 512 bytes / 512 bytes
>     I/O size (minimum/optimal): 512 bytes / 512 bytes
>     Disk identifier: 0xf869ba1c
> 
>        Device Boot      Start         End      Blocks   Id  System
>     /dev/sde1            2048  1172123567   586060760   fd  Linux raid
> autodetect
> 
>     Disk /dev/sdd: 600.1 GB, 600127266816 bytes
>     81 heads, 63 sectors/track, 229693 cylinders, total 1172123568 sectors
>     Units = sectors of 1 * 512 = 512 bytes
>     Sector size (logical/physical): 512 bytes / 512 bytes
>     I/O size (minimum/optimal): 512 bytes / 512 bytes
>     Disk identifier: 0xf869ba1c
> 
>        Device Boot      Start         End      Blocks   Id  System
>     /dev/sdd1            2048  1172123567   586060760   fd  Linux raid
> autodetect
> 
>     Disk /dev/sdf: 600.1 GB, 600127266816 bytes
>     81 heads, 63 sectors/track, 229693 cylinders, total 1172123568 sectors
>     Units = sectors of 1 * 512 = 512 bytes
>     Sector size (logical/physical): 512 bytes / 512 bytes
>     I/O size (minimum/optimal): 512 bytes / 512 bytes
>     Disk identifier: 0xb4893c3c
> 
>        Device Boot      Start         End      Blocks   Id  System
>     /dev/sdf1            2048  1172123567   586060760   fd  Linux raid
> autodetect
> 
> 
> and here is the RAID 10 (md3) with 64K chunk size:
> 
>     cat /proc/mdstat
>     Personalities : [raid0] [raid1] [raid10]
>     md3 : active raid10 sdf1[3] sde1[2] sdd1[1] sdc1[0]
>           1172121344 blocks 64K chunks 2 near-copies [4/4] [UUUU]
> 
>     md1 : active raid1 sda1[0] sdb1[1]
>           112320 blocks [2/2] [UU]
> 
>     md2 : active raid1 sda2[0] sdb2[1]
>           41953664 blocks [2/2] [UU]
> 
>     unused devices: <none>
> 
> I did IO testing with `fio` against the raw RAID device (md3), and the
> result looks good(read IOPS 1723 / write IOPS 168):
> 
>     # fio --filename=/dev/md3 --direct=1 --rw=randrw --bs=16k
> --size=5G --numjobs=16 --runtime=60 --group_reporting --name=file1
> --rwmixread=90 --thread --ioengine=p
>     file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
>     ...
>     file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
>     fio 2.0.3
>     Starting 16 threads
>     Jobs: 16 (f=16): [mmmmmmmmmmmmmmmm] [100.0% done] [28234K/2766K
> /s] [1723 /168  iops] [eta 00m:00s]
>     file1: (groupid=0, jobs=16): err= 0: pid=17107
>       read : io=1606.3MB, bw=27406KB/s, iops=1712 , runt= 60017msec
>         clat (usec): min=221 , max=123233 , avg=7693.00, stdev=7734.82
>          lat (usec): min=221 , max=123233 , avg=7693.12, stdev=7734.82
>         clat percentiles (usec):
>          |  1.00th=[ 1128],  5.00th=[ 1560], 10.00th=[ 1928], 20.00th=[ 2640],
>          | 30.00th=[ 3376], 40.00th=[ 4128], 50.00th=[ 4896], 60.00th=[ 6304],
>          | 70.00th=[ 8256], 80.00th=[11200], 90.00th=[16768], 95.00th=[23168],
>          | 99.00th=[38656], 99.50th=[45824], 99.90th=[62720]
>         bw (KB/s)  : min=  888, max=13093, per=7.59%, avg=2079.11, stdev=922.54
>       write: io=183840KB, bw=3063.2KB/s, iops=191 , runt= 60017msec
>         clat (msec): min=1 , max=153 , avg=14.70, stdev=14.59
>          lat (msec): min=1 , max=153 , avg=14.70, stdev=14.59
>         clat percentiles (usec):
>          |  1.00th=[ 1816],  5.00th=[ 2544], 10.00th=[ 3248], 20.00th=[ 4512],
>          | 30.00th=[ 5728], 40.00th=[ 7648], 50.00th=[ 9536], 60.00th=[12480],
>          | 70.00th=[16320], 80.00th=[22144], 90.00th=[32640], 95.00th=[43264],
>          | 99.00th=[71168], 99.50th=[82432], 99.90th=[111104]
>         bw (KB/s)  : min=   90, max= 5806, per=33.81%, avg=1035.45, stdev=973.10
>         lat (usec) : 250=0.05%, 500=0.09%, 750=0.05%, 1000=0.19%
>         lat (msec) : 2=9.61%, 4=26.05%, 10=38.46%, 20=16.82%, 50=8.02%
>         lat (msec) : 100=0.63%, 250=0.03%
>       cpu          : usr=1.02%, sys=2.87%, ctx=1926728, majf=0, minf=288891
>       IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
> 32=0.0%, >=64=0.0%
>          submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
>          complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
>          issued    : total=r=102801/w=11490/d=0, short=r=0/w=0/d=0
> 
>     Run status group 0 (all jobs):
>        READ: io=1606.3MB, aggrb=27405KB/s, minb=28063KB/s,
> maxb=28063KB/s, mint=60017msec, maxt=60017msec
>       WRITE: io=183840KB, aggrb=3063KB/s, minb=3136KB/s,
> maxb=3136KB/s, mint=60017msec, maxt=60017msec
> 
>     Disk stats (read/write):
>         md3: ios=102753/11469, merge=0/0, ticks=0/0, in_queue=0,
> util=0.00%, aggrios=25764/5746, aggrmerge=0/0, aggrticks=197378/51351,
> aggrin_queue=248718, aggrutil=99.31%
>       sdc: ios=26256/5723, merge=0/0, ticks=204328/68364,
> in_queue=272668, util=99.20%
>       sdd: ios=25290/5723, merge=0/0, ticks=187572/61628,
> in_queue=249188, util=98.73%
>       sde: ios=25689/5769, merge=0/0, ticks=197340/71828,
> in_queue=269172, util=99.31%
>       sdf: ios=25822/5769, merge=0/0, ticks=200272/3584,
> in_queue=203844, util=97.87%
> 
> then I created ext4 filesystem on top of the RAID device and mounted
> it to /mnt/test:
> 
>     mkfs.ext4 -E stride=16,stripe-width=32 /dev/md3
>     mount /dev/md3 /mnt/test -o noatime,nodiratime,data=writeback,nobarrier
> 
> after that I did the very same IO testing, but the result looks very
> bad(read IOPS 926 / write IOPS 97):
> 
>     # fio --filename=/mnt/test/test --direct=1 --rw=randrw --bs=16k
> --size=5G --numjobs=16 --runtime=60 --group_reporting --name=file1
> --rwmixread=90 --thread --ioengine=psync
>     file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
>     ...
>     file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
>     fio 2.0.3
>     Starting 16 threads
>     file1: Laying out IO file(s) (1 file(s) / 5120MB)
>     Jobs: 16 (f=16): [mmmmmmmmmmmmmmmm] [100.0% done] [15172K/1604K
> /s] [926 /97  iops] [eta 00m:00s]
>     file1: (groupid=0, jobs=16): err= 0: pid=18764
>       read : io=838816KB, bw=13974KB/s, iops=873 , runt= 60025msec
>         clat (usec): min=228 , max=111583 , avg=16412.46, stdev=11632.03
>          lat (usec): min=228 , max=111583 , avg=16412.60, stdev=11632.03
>         clat percentiles (usec):
>          |  1.00th=[ 1384],  5.00th=[ 2320], 10.00th=[ 3376], 20.00th=[ 5216],
>          | 30.00th=[ 8256], 40.00th=[11456], 50.00th=[14656], 60.00th=[17792],
>          | 70.00th=[21376], 80.00th=[25472], 90.00th=[32128], 95.00th=[37632],
>          | 99.00th=[50944], 99.50th=[56576], 99.90th=[70144]
>         bw (KB/s)  : min=  308, max= 4448, per=6.90%, avg=964.30, stdev=339.53
>       write: io=94208KB, bw=1569.5KB/s, iops=98 , runt= 60025msec
>         clat (msec): min=1 , max=89 , avg=16.91, stdev=10.24
>          lat (msec): min=1 , max=89 , avg=16.92, stdev=10.24
>         clat percentiles (usec):
>          |  1.00th=[ 2384],  5.00th=[ 3888], 10.00th=[ 5088], 20.00th=[ 7776],
>          | 30.00th=[10304], 40.00th=[12736], 50.00th=[15296], 60.00th=[17792],
>          | 70.00th=[20864], 80.00th=[24960], 90.00th=[30848], 95.00th=[35584],
>          | 99.00th=[47360], 99.50th=[51456], 99.90th=[62208]
>         bw (KB/s)  : min=   31, max= 4676, per=62.37%, avg=978.64, stdev=896.53
>         lat (usec) : 250=0.01%, 500=0.03%, 750=0.01%, 1000=0.06%
>         lat (msec) : 2=3.15%, 4=9.42%, 10=22.23%, 20=31.61%, 50=32.39%
>         lat (msec) : 100=1.08%, 250=0.01%
>       cpu          : usr=0.59%, sys=2.63%, ctx=1700318, majf=0, minf=19888
>       IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
> 32=0.0%, >=64=0.0%
>          submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
>          complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
>          issued    : total=r=52426/w=5888/d=0, short=r=0/w=0/d=0
> 
>     Run status group 0 (all jobs):
>        READ: io=838816KB, aggrb=13974KB/s, minb=14309KB/s,
> maxb=14309KB/s, mint=60025msec, maxt=60025msec
>       WRITE: io=94208KB, aggrb=1569KB/s, minb=1607KB/s, maxb=1607KB/s,
> mint=60025msec, maxt=60025msec
> 
>     Disk stats (read/write):
>         md3: ios=58848/13987, merge=0/0, ticks=0/0, in_queue=0,
> util=0.00%, aggrios=14750/4159, aggrmerge=0/2861,
> aggrticks=112418/28260, aggrin_queue=140664, aggrutil=84.95%
>       sdc: ios=17688/4221, merge=0/2878, ticks=148664/37972,
> in_queue=186628, util=84.95%
>       sdd: ios=11801/4219, merge=0/2880, ticks=79396/29192,
> in_queue=108572, util=70.71%
>       sde: ios=16427/4099, merge=0/2843, ticks=129072/35252,
> in_queue=164304, util=81.57%
>       sdf: ios=13086/4097, merge=0/2845, ticks=92540/10624,
> in_queue=103152, util=60.02%
> 
> anything goes wrong here?
> 
> 
> --
> Xupeng Yun
> http://about.me/xupeng
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ