linux-kernel - Re: [RFC] ext3: per-process soft-syncing data=ordered mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200801311214.55287.chris.mason@oracle.com>
Date:	Thu, 31 Jan 2008 12:14:54 -0500
From:	Chris Mason <chris.mason@...cle.com>
To:	Jan Kara <jack@...e.cz>
Cc:	Al Boldi <a1426z@...ab.com>, Andreas Dilger <adilger@....com>,
	Chris Snook <csnook@...hat.com>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC] ext3: per-process soft-syncing data=ordered mode

On Thursday 31 January 2008, Jan Kara wrote:
> On Thu 31-01-08 11:56:01, Chris Mason wrote:
> > On Thursday 31 January 2008, Al Boldi wrote:
> > > Andreas Dilger wrote:
> > > > On Wednesday 30 January 2008, Al Boldi wrote:
> > > > > And, a quick test of successive 1sec delayed syncs shows no hangs
> > > > > until about 1 minute (~180mb) of db-writeout activity, when the
> > > > > sync abruptly hangs for minutes on end, and io-wait shows almost
> > > > > 100%.
> > > >
> > > > How large is the journal in this filesystem?  You can check via
> > > > "debugfs -R 'stat <8>' /dev/XXX".
> > >
> > > 32mb.
> > >
> > > > Is this affected by increasing
> > > > the journal size?  You can set the journal size via "mke2fs -J
> > > > size=400" at format time, or on an unmounted filesystem by running
> > > > "tune2fs -O ^has_journal /dev/XXX" then "tune2fs -J size=400
> > > > /dev/XXX".
> > >
> > > Setting size=400 doesn't help, nor does size=4.
> > >
> > > > I suspect that the stall is caused by the journal filling up, and
> > > > then waiting while the entire journal is checkpointed back to the
> > > > filesystem before the next transaction can start.
> > > >
> > > > It is possible to improve this behaviour in JBD by reducing the
> > > > amount of space that is cleared if the journal becomes "full", and
> > > > also doing journal checkpointing before it becomes full.  While that
> > > > may reduce performance a small amount, it would help avoid such huge
> > > > latency problems. I believe we have such a patch in one of the Lustre
> > > > branches already, and while I'm not sure what kernel it is for the
> > > > JBD code rarely changes much....
> > >
> > > The big difference between ordered and writeback is that once the
> > > slowdown starts, ordered goes into ~100% iowait, whereas writeback
> > > continues 100% user.
> >
> > Does data=ordered write buffers in the order they were dirtied?  This
> > might explain the extreme problems in transactional workloads.
>
>   Well, it does but we submit them to block layer all at once so elevator
> should sort the requests for us...

nr_requests is fairly small, so a long stream of random requests should still 
end up being random IO.

Al, could you please compare the write throughput from vmstat for the 
data=ordered vs data=writeback runs?  I would guess the data=ordered one has 
a lower overall write throughput.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/