linux-ext4 - Re: ext4, barrier, md/RAID1 and write cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <3FF04DCD-7CE4-486A-92F5-2337BC64AE50@dilger.ca>
Date:	Tue, 8 May 2012 11:02:19 -0600
From:	Andreas Dilger <adilger@...ger.ca>
To:	Daniel Pocock <daniel@...ock.com.au>
Cc:	Martin Steigerwald <ms@...mix.de>,
	Martin Steigerwald <Martin@...htvoll.de>,
	linux-ext4@...r.kernel.org
Subject: Re: ext4, barrier, md/RAID1 and write cache

On 2012-05-08, at 9:28 AM, Daniel Pocock wrote:
> My impression is that the faster performance of the USB disk was a red
> herring, and the problem really is just the nature of the NFS protocol
> and the way it is stricter about server-side caching (when sync is
> enabled) and consequently it needs more iops.
> 
> I've turned two more machines (a HP Z800 with SATA disk and a Lenovo
> X220 with SSD disk) into NFSv3 servers, repeated the same tests, and
> found similar performance on the Z800, but 20x faster on the SSD (which
> can support more IOPS)

Another possible option is to try "-o data=journal" for the ext4
filesystem.  This will, in theory, turn your random IO workload to
the filesystem into a streaming IO workload to the journal.  This
is only useful if the filesystem is not continually busy, and needs
a large enough journal (and enough RAM to match) to handle the burst
IO loads.

For example, if you are writing 1GB of data you need a 4GB journal
size and 4GB of RAM to allow all of the data to burst into the journal
and write into the filesystem asynchronously.  It it would also be
interesting to see if there is a benefit from running with an external
journal (possibly on a separate disk or an SSD), because then the
synchronous part of the IO does not seek, and then the small IOs can
be safely written to the filesystem asynchronously (they will be
rewritten from the journal if the server crashes).

Typically, data=journal mode will decrease I/O performance by 1/2,
since all data is written twice, but in your case NFS is hurting the
performance far more than this, so the extra "overhead" may still
give better performance visible to the clients.

>>> All the iostat output is typically like this:
>>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>>> avgrq-sz avgqu-sz   await  svctm  %util
>>> dm-23             0.00     0.00    0.20  187.60     0.00     0.81
>>> 8.89     2.02   10.79   5.07  95.20
>>> dm-23             0.00     0.00    0.20  189.80     0.00     0.91
>>> 9.84     1.95   10.29   4.97  94.48
>>> dm-23             0.00     0.00    0.20  228.60     0.00     1.00
>>> 8.92     1.97    8.58   4.10  93.92
>>> dm-23             0.00     0.00    0.20  231.80     0.00     0.98
>>> 8.70     1.96    8.49   4.06  94.16
>>> dm-23             0.00     0.00    0.20  229.20     0.00     0.94
>>> 8.40     1.92    8.39   4.10  94.08
>> 
>> Hmmm, disk looks quite utilitzed. Are there other I/O workloads on the 
>> machine?
> 
> No, just me testing it

Looking at these results, the average IO size is very small.  Looking
at the writes/second of around 210w/s and the write bandwidth of 1MB/s,
this is only an average write size of only 4.5kB.

Cheers, Andreas

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html