[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200803122022.22814.phillips@phunq.net>
Date: Wed, 12 Mar 2008 19:22:21 -0800
From: Daniel Phillips <phillips@...nq.net>
To: "Duane Griffin" <duaneg@...da.com>
Cc: linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org,
Theodore Tso <tytso@....edu>, sct@...hat.com,
akpm@...ux-foundation.org, adilger@...sterfs.com
Subject: Re: [RFC, PATCH 0/6] ext3: do not modify data on-disk when mounting read-only filesystem
Hi Duane,
Thanks for doing this. Some perhaps not so obvious fallout from the bad
old way of doing things is that ddnap (zumastor) hits an issue in
replication. Since ddsnap allows journal replay on the downstream
server and also needs to have an unaltered snapshot to apply deltas
against, if we do not take special care, Ext3 will come along and
modify the downstream snapshot even when told not to. Our solution:
take two snapshots per replication cycle (pretty cheap) so that one can
be clean and the other can be stepped on at will by the journal replay.
Ugh.
With your hack, we can eventually drop the double snapshot, provided no
other filesystem is similarly badly behaved.
Re your page translation table: we already have a page translation
table, it is called the page cache. If you could figure out which file
(or metadata) each journal block belongs to, you could just load the
page table pages back in and presto, done. No need to replay the
journal at all, you are already back to journal+disk = consistent
state.
I probably have missed a detail or two since I haven't looked closely at
how orphan inodes work, revokes, probably other things, but there is
the basic idea. SCT, does my reasoning hold water? (In fact,
ddsnap "replays" its own journal in exactly this way. Cache state is
reconstructed and no actual journal flush is performed.)
Anyway, this is just a theoretical comment, it is in no way a suggestion
for a rewrite. The reason for that being, you do not have any
convenient way to map physical journal blocks back to files and
metadata. Maybe if we do implement reverse mapping for Ext3/4 later
(not just a pipe dream) we could revisit this and lose your extra
mapping. As it stands your solution seems well built, after a quick
readthrough. Nice looking code. I think you added about 250 lines
overall, so tight too. Thanks again.
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists