lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200803122022.22814.phillips@phunq.net>
Date:	Wed, 12 Mar 2008 19:22:21 -0800
From:	Daniel Phillips <phillips@...nq.net>
To:	"Duane Griffin" <duaneg@...da.com>
Cc:	linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org,
	Theodore Tso <tytso@....edu>, sct@...hat.com,
	akpm@...ux-foundation.org, adilger@...sterfs.com
Subject: Re: [RFC, PATCH 0/6] ext3: do not modify data on-disk when mounting read-only filesystem

Hi Duane,

Thanks for doing this.  Some perhaps not so obvious fallout from the bad 
old way of doing things is that ddnap (zumastor) hits an issue in 
replication.  Since ddsnap allows journal replay on the downstream 
server and also needs to have an unaltered snapshot to apply deltas 
against, if we do not take special care, Ext3 will come along and 
modify the downstream snapshot even when told not to.  Our solution: 
take two snapshots per replication cycle (pretty cheap) so that one can 
be clean and the other can be stepped on at will by the journal replay.
Ugh.

With your hack, we can eventually drop the double snapshot, provided no 
other filesystem is similarly badly behaved.

Re your page translation table: we already have a page translation 
table, it is called the page cache.  If you could figure out which file 
(or metadata) each journal block belongs to, you could just load the 
page table pages back in and presto, done.  No need to replay the 
journal at all, you are already back to journal+disk = consistent 
state.

I probably have missed a detail or two since I haven't looked closely at 
how orphan inodes work, revokes, probably other things, but there is 
the basic idea.  SCT, does my reasoning hold water?  (In fact, 
ddsnap "replays" its own journal in exactly this way.  Cache state is 
reconstructed and no actual journal flush is performed.)

Anyway, this is just a theoretical comment, it is in no way a suggestion 
for a rewrite.  The reason for that being, you do not have any 
convenient way to map physical journal blocks back to files and 
metadata.  Maybe if we do implement reverse mapping for Ext3/4 later 
(not just a pipe dream) we could revisit this and lose your extra 
mapping.  As it stands your solution seems well built, after a quick 
readthrough.  Nice looking code.  I think you added about 250 lines 
overall, so tight too.  Thanks again.

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ