[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080204175415.GH3426@duck.suse.cz>
Date: Mon, 4 Feb 2008 18:54:15 +0100
From: Jan Kara <jack@...e.cz>
To: Al Boldi <a1426z@...ab.com>
Cc: Chris Mason <chris.mason@...cle.com>,
Andreas Dilger <adilger@....com>,
Chris Snook <csnook@...hat.com>, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC] ext3: per-process soft-syncing data=ordered mode
On Sat 02-02-08 00:26:00, Al Boldi wrote:
> Chris Mason wrote:
> > On Thursday 31 January 2008, Jan Kara wrote:
> > > On Thu 31-01-08 11:56:01, Chris Mason wrote:
> > > > On Thursday 31 January 2008, Al Boldi wrote:
> > > > > The big difference between ordered and writeback is that once the
> > > > > slowdown starts, ordered goes into ~100% iowait, whereas writeback
> > > > > continues 100% user.
> > > >
> > > > Does data=ordered write buffers in the order they were dirtied? This
> > > > might explain the extreme problems in transactional workloads.
> > >
> > > Well, it does but we submit them to block layer all at once so
> > > elevator should sort the requests for us...
> >
> > nr_requests is fairly small, so a long stream of random requests should
> > still end up being random IO.
> >
> > Al, could you please compare the write throughput from vmstat for the
> > data=ordered vs data=writeback runs? I would guess the data=ordered one
> > has a lower overall write throughput.
>
> That's what I would have guessed, but it's actually going up 4x fold for
> mysql from 559mb to 2135mb, while the db-size ends up at 549mb.
So you say we write 4-times as much data in ordered mode as in writeback
mode. Hmm, probably possible because we force all the dirty data to disk
when committing a transation in ordered mode (and don't do this in
writeback mode). So if the workload repeatedly dirties the whole DB, we are
going to write the whole DB several times in ordered mode but in writeback
mode we just keep the data in memory all the time. But this is what you
ask for if you mount in ordered mode so I wouldn't consider it a bug.
I still don't like your hack with per-process journal mode setting but we
could easily do per-file journal mode setting (we already have a flag to do
data journaling for a file) and that would help at least your DB
workload...
Honza
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists