[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120301194735.GD32588@thunk.org>
Date: Thu, 1 Mar 2012 14:47:35 -0500
From: Ted Ts'o <tytso@....edu>
To: Xupeng Yun <xupeng@...eng.me>
Cc: Ext4 development <linux-ext4@...r.kernel.org>
Subject: Re: Bad performance of ext4 with kernel 3.0.17
Two things I'd try:
#1) If this is a freshly created file system, the kernel may be
initializing the inode table in the background, and this could be
interfering with your benchmark workload. To address this, you can
either (a) add the mount option noinititable, (b) add the mke2fs
option "-E lazy_itable_init=0" --- but this will cause the mke2fs to
take a lot longer, or (c) mount the file system and wait until
"dumpe2fs /dev/md3 | tail" shows that the last block group has the
ITABLE_ZEROED flag set. For benchmarking purposes on a scratch
workload, option (a) above is the fast thing to do.
#2) It could be that the file system is choosing blocks farther away
from the beginning of the disk, which is slower, whereas the fio on
the raw disk will use the blocks closest to the beginning of the disk,
which are the fastest one. You could try creating the file system so
it is only 10GB, and then try running fio on that small, truncated
file system, and see if that makes a difference.
- Ted
On Thu, Mar 01, 2012 at 01:31:58PM +0800, Xupeng Yun wrote:
> I just set up a new server (Gentoo 64bit with kernel 3.0.17) with 4 x
> 15000RPM SAS disks(sdc, sdd, sde and sdf), and created soft RAID10 on
> top of them, the partitions are aligned at 1MB:
>
> # fdisk -lu /dev/sd{c,e,d,f}
>
> Disk /dev/sdc: 600.1 GB, 600127266816 bytes
> 255 heads, 63 sectors/track, 72961 cylinders, total 1172123568 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0xdd96eace
>
> Device Boot Start End Blocks Id System
> /dev/sdc1 2048 1172123567 586060760 fd Linux raid
> autodetect
>
> Disk /dev/sde: 600.1 GB, 600127266816 bytes
> 3 heads, 63 sectors/track, 6201712 cylinders, total 1172123568 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0xf869ba1c
>
> Device Boot Start End Blocks Id System
> /dev/sde1 2048 1172123567 586060760 fd Linux raid
> autodetect
>
> Disk /dev/sdd: 600.1 GB, 600127266816 bytes
> 81 heads, 63 sectors/track, 229693 cylinders, total 1172123568 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0xf869ba1c
>
> Device Boot Start End Blocks Id System
> /dev/sdd1 2048 1172123567 586060760 fd Linux raid
> autodetect
>
> Disk /dev/sdf: 600.1 GB, 600127266816 bytes
> 81 heads, 63 sectors/track, 229693 cylinders, total 1172123568 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0xb4893c3c
>
> Device Boot Start End Blocks Id System
> /dev/sdf1 2048 1172123567 586060760 fd Linux raid
> autodetect
>
>
> and here is the RAID 10 (md3) with 64K chunk size:
>
> cat /proc/mdstat
> Personalities : [raid0] [raid1] [raid10]
> md3 : active raid10 sdf1[3] sde1[2] sdd1[1] sdc1[0]
> 1172121344 blocks 64K chunks 2 near-copies [4/4] [UUUU]
>
> md1 : active raid1 sda1[0] sdb1[1]
> 112320 blocks [2/2] [UU]
>
> md2 : active raid1 sda2[0] sdb2[1]
> 41953664 blocks [2/2] [UU]
>
> unused devices: <none>
>
> I did IO testing with `fio` against the raw RAID device (md3), and the
> result looks good(read IOPS 1723 / write IOPS 168):
>
> # fio --filename=/dev/md3 --direct=1 --rw=randrw --bs=16k
> --size=5G --numjobs=16 --runtime=60 --group_reporting --name=file1
> --rwmixread=90 --thread --ioengine=p
> file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
> ...
> file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
> fio 2.0.3
> Starting 16 threads
> Jobs: 16 (f=16): [mmmmmmmmmmmmmmmm] [100.0% done] [28234K/2766K
> /s] [1723 /168 iops] [eta 00m:00s]
> file1: (groupid=0, jobs=16): err= 0: pid=17107
> read : io=1606.3MB, bw=27406KB/s, iops=1712 , runt= 60017msec
> clat (usec): min=221 , max=123233 , avg=7693.00, stdev=7734.82
> lat (usec): min=221 , max=123233 , avg=7693.12, stdev=7734.82
> clat percentiles (usec):
> | 1.00th=[ 1128], 5.00th=[ 1560], 10.00th=[ 1928], 20.00th=[ 2640],
> | 30.00th=[ 3376], 40.00th=[ 4128], 50.00th=[ 4896], 60.00th=[ 6304],
> | 70.00th=[ 8256], 80.00th=[11200], 90.00th=[16768], 95.00th=[23168],
> | 99.00th=[38656], 99.50th=[45824], 99.90th=[62720]
> bw (KB/s) : min= 888, max=13093, per=7.59%, avg=2079.11, stdev=922.54
> write: io=183840KB, bw=3063.2KB/s, iops=191 , runt= 60017msec
> clat (msec): min=1 , max=153 , avg=14.70, stdev=14.59
> lat (msec): min=1 , max=153 , avg=14.70, stdev=14.59
> clat percentiles (usec):
> | 1.00th=[ 1816], 5.00th=[ 2544], 10.00th=[ 3248], 20.00th=[ 4512],
> | 30.00th=[ 5728], 40.00th=[ 7648], 50.00th=[ 9536], 60.00th=[12480],
> | 70.00th=[16320], 80.00th=[22144], 90.00th=[32640], 95.00th=[43264],
> | 99.00th=[71168], 99.50th=[82432], 99.90th=[111104]
> bw (KB/s) : min= 90, max= 5806, per=33.81%, avg=1035.45, stdev=973.10
> lat (usec) : 250=0.05%, 500=0.09%, 750=0.05%, 1000=0.19%
> lat (msec) : 2=9.61%, 4=26.05%, 10=38.46%, 20=16.82%, 50=8.02%
> lat (msec) : 100=0.63%, 250=0.03%
> cpu : usr=1.02%, sys=2.87%, ctx=1926728, majf=0, minf=288891
> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
> 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
> issued : total=r=102801/w=11490/d=0, short=r=0/w=0/d=0
>
> Run status group 0 (all jobs):
> READ: io=1606.3MB, aggrb=27405KB/s, minb=28063KB/s,
> maxb=28063KB/s, mint=60017msec, maxt=60017msec
> WRITE: io=183840KB, aggrb=3063KB/s, minb=3136KB/s,
> maxb=3136KB/s, mint=60017msec, maxt=60017msec
>
> Disk stats (read/write):
> md3: ios=102753/11469, merge=0/0, ticks=0/0, in_queue=0,
> util=0.00%, aggrios=25764/5746, aggrmerge=0/0, aggrticks=197378/51351,
> aggrin_queue=248718, aggrutil=99.31%
> sdc: ios=26256/5723, merge=0/0, ticks=204328/68364,
> in_queue=272668, util=99.20%
> sdd: ios=25290/5723, merge=0/0, ticks=187572/61628,
> in_queue=249188, util=98.73%
> sde: ios=25689/5769, merge=0/0, ticks=197340/71828,
> in_queue=269172, util=99.31%
> sdf: ios=25822/5769, merge=0/0, ticks=200272/3584,
> in_queue=203844, util=97.87%
>
> then I created ext4 filesystem on top of the RAID device and mounted
> it to /mnt/test:
>
> mkfs.ext4 -E stride=16,stripe-width=32 /dev/md3
> mount /dev/md3 /mnt/test -o noatime,nodiratime,data=writeback,nobarrier
>
> after that I did the very same IO testing, but the result looks very
> bad(read IOPS 926 / write IOPS 97):
>
> # fio --filename=/mnt/test/test --direct=1 --rw=randrw --bs=16k
> --size=5G --numjobs=16 --runtime=60 --group_reporting --name=file1
> --rwmixread=90 --thread --ioengine=psync
> file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
> ...
> file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
> fio 2.0.3
> Starting 16 threads
> file1: Laying out IO file(s) (1 file(s) / 5120MB)
> Jobs: 16 (f=16): [mmmmmmmmmmmmmmmm] [100.0% done] [15172K/1604K
> /s] [926 /97 iops] [eta 00m:00s]
> file1: (groupid=0, jobs=16): err= 0: pid=18764
> read : io=838816KB, bw=13974KB/s, iops=873 , runt= 60025msec
> clat (usec): min=228 , max=111583 , avg=16412.46, stdev=11632.03
> lat (usec): min=228 , max=111583 , avg=16412.60, stdev=11632.03
> clat percentiles (usec):
> | 1.00th=[ 1384], 5.00th=[ 2320], 10.00th=[ 3376], 20.00th=[ 5216],
> | 30.00th=[ 8256], 40.00th=[11456], 50.00th=[14656], 60.00th=[17792],
> | 70.00th=[21376], 80.00th=[25472], 90.00th=[32128], 95.00th=[37632],
> | 99.00th=[50944], 99.50th=[56576], 99.90th=[70144]
> bw (KB/s) : min= 308, max= 4448, per=6.90%, avg=964.30, stdev=339.53
> write: io=94208KB, bw=1569.5KB/s, iops=98 , runt= 60025msec
> clat (msec): min=1 , max=89 , avg=16.91, stdev=10.24
> lat (msec): min=1 , max=89 , avg=16.92, stdev=10.24
> clat percentiles (usec):
> | 1.00th=[ 2384], 5.00th=[ 3888], 10.00th=[ 5088], 20.00th=[ 7776],
> | 30.00th=[10304], 40.00th=[12736], 50.00th=[15296], 60.00th=[17792],
> | 70.00th=[20864], 80.00th=[24960], 90.00th=[30848], 95.00th=[35584],
> | 99.00th=[47360], 99.50th=[51456], 99.90th=[62208]
> bw (KB/s) : min= 31, max= 4676, per=62.37%, avg=978.64, stdev=896.53
> lat (usec) : 250=0.01%, 500=0.03%, 750=0.01%, 1000=0.06%
> lat (msec) : 2=3.15%, 4=9.42%, 10=22.23%, 20=31.61%, 50=32.39%
> lat (msec) : 100=1.08%, 250=0.01%
> cpu : usr=0.59%, sys=2.63%, ctx=1700318, majf=0, minf=19888
> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
> 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
> issued : total=r=52426/w=5888/d=0, short=r=0/w=0/d=0
>
> Run status group 0 (all jobs):
> READ: io=838816KB, aggrb=13974KB/s, minb=14309KB/s,
> maxb=14309KB/s, mint=60025msec, maxt=60025msec
> WRITE: io=94208KB, aggrb=1569KB/s, minb=1607KB/s, maxb=1607KB/s,
> mint=60025msec, maxt=60025msec
>
> Disk stats (read/write):
> md3: ios=58848/13987, merge=0/0, ticks=0/0, in_queue=0,
> util=0.00%, aggrios=14750/4159, aggrmerge=0/2861,
> aggrticks=112418/28260, aggrin_queue=140664, aggrutil=84.95%
> sdc: ios=17688/4221, merge=0/2878, ticks=148664/37972,
> in_queue=186628, util=84.95%
> sdd: ios=11801/4219, merge=0/2880, ticks=79396/29192,
> in_queue=108572, util=70.71%
> sde: ios=16427/4099, merge=0/2843, ticks=129072/35252,
> in_queue=164304, util=81.57%
> sdf: ios=13086/4097, merge=0/2845, ticks=92540/10624,
> in_queue=103152, util=60.02%
>
> anything goes wrong here?
>
>
> --
> Xupeng Yun
> http://about.me/xupeng
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists