[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100330014700.GD18791@atrey.karlin.mff.cuni.cz>
Date: Tue, 30 Mar 2010 03:47:00 +0200
From: Jan Kara <jack@...e.cz>
To: Dmitry Monakhov <dmonakhov@...nvz.org>
Cc: tytso@....edu, linux-ext4@...r.kernel.org
Subject: Re: [PATCH 1/2] ext4/jbd2: fix io-barrier logic in case of
external journal
> tytso@....edu writes:
>
> > On Mon, Mar 22, 2010 at 05:04:19PM +0300, Dmitry Monakhov wrote:
> >> tytso@....edu writes:
> >>
> >> > On Fri, Mar 12, 2010 at 08:26:49PM +0300, Dmitry Monakhov wrote:
> >> >> start_journal_io:
> >> >> + if (bufs)
> >> >> + commit_transaction->t_flushed_data_blocks = 1;
> >> >> +
> >> >
> >> > I'm not convinced this is right.
> >> >
> >> > From your test case, the problem isn't because we have journaled
> >> > metadata blocks (which is what bufs) counts, but because fsync()
> >> > depends on data blocks also getting flushed out to disks.
> >> >
> >> > However, if we aren't closing the transaction because of fsync(), I
> >> > don't think we need to do a barrier in the case of an external
> >> > journal. So instead of effectively unconditionally setting
> >> > t_flushed_data_blocks (since bufs is nearly always going to be
> >> > non-zero), I think the better fix is to test to see if the journal
> >> > device != to the fs data device in fsync(), and if so, start the
> >> > barrier operation there.
> >> >
> >> > Do you agree?
> >> Yes.
> >
> > Just to be clear, since I realized I wrote fsync() when I should have
> > written fs/ext4/fsync.c, my proposal was to put this check in
> > ext4_sync_file().
> This means that we will end up with two io-barriers in a row
> for external journal case:
> ext4_sync_file()
> ->jbd2_log_start_commit()
> ->blkdev_issue_flush(j_fs_dev)
> /* seems following flush is mandatory to guarantee the metadata
> * consistency */
> ->blkdev_issue_flush(j_fs_dev)
>
> may be it will be better to pass some sort of barrier flag to
> jbd2_log_start_commit() this function mark journal->j_commit_request
> with ->t_flushed_data_blocks = 1, so io-carrier will be issued
> even if transaction has only metadata
> Advantages:
> 1) approach will works for all data modes
> 2) only one io-barrier is needed to guarantee the data and
> metadata consiscency.
> 3) changes not so complicated.
>
> I've already started to work with the patch but was surprised with
> commit logic.
> ->__jbd2_log_start_commit(target)
> journal->j_commit_request = target
> ->wake_up(&journal->j_wait_commit)
> ->kjournald2
> transaction = journal->j_running_transaction
> ->jbd2_journal_commit_transaction(journal);
> commit_transaction = journal->j_running_transaction
> ...
> /* So j_commit_request is used only for comparison. But actually
> committing journal->j_running_transaction->t_tid transaction
> instead of j_commit_request. Why?
> */
Remember: There is always at most one running transaction and at most one
transaction in the committing state - all the JBD2 and ext4 code heavily relies
on this. So in this case, because kjournald isn't committing any transaction,
we know that there is no committing transaction and at most one running
transaction whose tid = journal->j_commit_sequence + 1. So actually the only
trasaction we can commit is the running one and it does not even make sence to
try committing another one (the check in log_start_commit actually makes sure
that j_commit_request cannot be set to a lower value than it is and the code in
kjournald2 makes sure that j_commit_request >= j_commit_sequence. So I think
the code works right (although I agree it's hard to read - it's an old legacy
;).
Honza
--
Jan Kara <jack@...e.cz>
SuSE CR Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists