linux-ext4 - Re: ext4 writepages is making tiny bios?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090903055201.GA7146@discord.disaster>
Date:	Thu, 3 Sep 2009 15:52:01 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Christoph Hellwig <hch@...radead.org>
Cc:	Theodore Tso <tytso@....edu>, Chris Mason <chris.mason@...cle.com>,
	linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: ext4 writepages is making tiny bios?

On Tue, Sep 01, 2009 at 05:27:40PM -0400, Christoph Hellwig wrote:
> On Tue, Sep 01, 2009 at 04:57:44PM -0400, Theodore Tso wrote:
> > > This graph shows the difference:
> > > 
> > > http://oss.oracle.com/~mason/seekwatcher/trace-buffered.png
> > 
> > Wow, I'm surprised how seeky XFS was in these graphs compared to ext4
> > and btrfs.  I wonder what was going on.
> 
> XFS did the mistake of trusting the VM, while everyone more or less
> overrode it.  Removing all those checks and writing out much larger
> data fixes it with a relatively small patch:
> 
> 	http://verein.lst.de/~hch/xfs/xfs-writeback-scaling

Careful:

-	tloff = min(tlast, startpage->index + 64);
+	tloff = min(tlast, startpage->index + 8192);

That will cause 64k page machines to try to write back 512MB at a
time. This will re-introduce similar to the behaviour in sles9 where
writeback would only terminate at the end of an extent (because the
mapping end wasn't capped like above).

This has two nasty side effects:

	1. horrible fsync latency when streaming writes are
	   occuring (e.g. NFS writes) which limit throughput
	2. a single large streaming write could delay the writeback
	   of thousands of small files indefinitely.

#1 is still an issue, but #2 might not be so bad compared to sles9
given the way inodes are cycled during writeback now...

> when that code was last benchamrked extensively (on SLES9) it
> worked nicely to saturate extremly large machines using buffered
> I/O, since then VM tuning basically destroyed it.

It was removed because it caused all sorts of problems and buffered
writes on sles9 were limited by lock contention in XFS, not the VM.
On 2.6.15, pdflush and the code the above patch removes was capable
of pushing more than 6GB/s of buffered writes to a single block
device. VM writeback has gone steadily down hill since then...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html