linux-ext4 - Re: ext4, barrier, md/RAID1 and write cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <201205080024.54183.Martin@lichtvoll.de>
Date:	Tue, 8 May 2012 00:24:53 +0200
From:	Martin Steigerwald <Martin@...htvoll.de>
To:	Daniel Pocock <daniel@...ock.com.au>
Cc:	Andreas Dilger <adilger@...ger.ca>, linux-ext4@...r.kernel.org
Subject: Re: ext4, barrier, md/RAID1 and write cache

Am Montag, 7. Mai 2012 schrieb Daniel Pocock:
> On 07/05/12 20:59, Martin Steigerwald wrote:
> > Am Montag, 7. Mai 2012 schrieb Daniel Pocock:
> >>> Possibly the older disk is lying about doing cache flushes.  The
> >>> wonderful disk manufacturers do that with commodity drives to make
> >>> their benchmark numbers look better.  If you run some random IOPS
> >>> test against this disk, and it has performance much over 100 IOPS
> >>> then it is definitely not doing real cache flushes.
> > 
> > […]
> > 
> > I think an IOPS benchmark would be better. I.e. something like:
> > 
> > /usr/share/doc/fio/examples/ssd-test
> > 
> > (from flexible I/O tester debian package, also included in upstream
> > tarball of course)
> > 
> > adapted to your needs.
> > 
> > Maybe with different iodepth or numjobs (to simulate several threads
> > generating higher iodepths). With iodepth=1 I have seen 54 IOPS on a
> > Hitachi 5400 rpm harddisk connected via eSATA.
> > 
> > Important is direct=1 to bypass the pagecache.
> 
> Thanks for suggesting this tool, I've run it against the USB disk and
> an LV on my AHCI/SATA/md array
> 
> Incidentally, I upgraded the Seagate firmware (model 7200.12 from CC34
> to CC49) and one of the disks went offline shortly after I brought the
> system back up.  To avoid the risk that a bad drive might interfere
> with the SATA performance, I completely removed it before running any
> tests. Tomorrow I'm out to buy some enterprise grade drives, I'm
> thinking about Seagate Constellation SATA or even SAS.
> 
> Anyway, onto the test results:
> 
> USB disk (Seagate  9SD2A3-500 320GB):
> 
> rand-write: (groupid=3, jobs=1): err= 0: pid=22519
>   write: io=46680KB, bw=796512B/s, iops=194, runt= 60012msec
>     slat (usec): min=13, max=25264, avg=106.02, stdev=525.18
>     clat (usec): min=993, max=103568, avg=20444.19, stdev=11622.11
>     bw (KB/s) : min=  521, max= 1224, per=100.06%, avg=777.48,
> stdev=97.07 cpu          : usr=0.73%, sys=2.33%, ctx=12024, majf=0,
> minf=20 IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%,
> 32=0.0%,

Please repeat the test with iodepth=1.

194 IOPS appears to be highly unrealistic unless NCQ or something like 
that is in use. At least if thats a 5400/7200 RPM sata drive (didn´t check 
vendor information).

iodepth=1 should give you what the hardware is capable without request 
queueing and reordering involved.

> The IOPS scores look similar, but I checked carefully and I'm fairly
> certain the disks were mounted correctly when the tests ran.
> 
> Should I run this tool over NFS, will the results be meaningful?
> 
> Given the need to replace a drive anyway, I'm really thinking about one
> of the following approaches:
> - same controller, upgrade to enterprise SATA drives
> - buy a dedicated SAS/SATA controller, upgrade to enterprise SATA
> drives
> - buy a dedicated SAS/SATA controller, upgrade to SAS drives
> 
> My HP N36L is quite small, one PCIe x16 slot, the internal drive cage
> has an SFF-8087 (mini SAS) plug, so I'm thinking I can grab something
> small like the Adaptec 1405 - will any of these solutions offer a
> definite win with my NFS issues though?

First I would like to understand more closely what your NFS issues are. 
Before throwing money at the problem its important to understand what the 
problem actually is.

Anyway, 15000 RPM SAS drives should give you more IOPS than 7200 RPM SATA 
drives, but SATA drives are cheaper and thus you could - depending on RAID 
level - increase IOPS by just using more drives.

But still first I´d like to understand *why* its slow.

What does

iostat -x -d -m 5
vmstat 5

say when excersing the slow (and probably a faster) setup? See [1].

[1] 
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

(quite some of this should be relevant when reporting with ext4 as well)

As for testing with NFS: I except the values to drop. NFS has quite some 
protocol overhead due to network roundtrips. On my nasic tests NFSv4 even 
more so than NFSv3. As for NFS I suggest trying nfsiostat python script 
from newer nfs-utils. It also shows latencies. 

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html