[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 06 Jun 2007 15:32:27 +0530
From: "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
To: Theodore Tso <tytso@....edu>
CC: linux-ext4@...r.kernel.org
Subject: Re: [RFC PATCH 1/1] e2fsprogs: Add undo I/O manager.
Theodore Tso wrote:
> On Tue, May 22, 2007 at 03:47:33PM +0530, Aneesh Kumar K.V wrote:
>> This I/O manager saves the contents of the location being overwritten
>> to a tdb database. This helps in undoing the changes done to the
>> file system.
>>
>> + /* loop through the existing entries and find if they overlap */
>> + for (key = tdb_firstkey(tdb); key.dptr; key = tdb_nextkey(tdb, key)) {
>> + data = tdb_fetch(tdb, key);
>> + /*
>> + * we are not interested in the data
>> + */
>> + free(data.dptr);
>
> Youch. This is going to be painful; it means that for every single
> write, we need to iterate over every single entry in the TDB. This
> doesn't scale terribly well. :-(
>
> I suspect you will do much better by using a different encoding for
> the tdb database; namely, one where you don't use an extent-based
> encoding, but rather something which is strictly block based. So that
> means a separate entry in the tdb database for each block entry. That
> will be slightly more inefficient in terms of overhead stored in the
> tdb database, yes, but as a percentage of the blocksize (typically
> 4k), the overhead isn't that great, and the performance improvement
> will be huge.
>
> The downside of doing this is that you need to take into account
> changes in blocksize via set_blksize() (this is rare; it's only done
> by mke2fs, and then only to zap various MD superblocks, et. al), and
> if the size is negative (in which case it represents the size in
> bytes, which could be more or less than a blocksize, and not
> necessarily a multiple of a blocksize).
>
If we allow to change the block size in between that would mean the
records that we store in the tdb database will be of variable size (
different block sizes). That would also add all the code/complexity that
i have in is_trans_overlapping. So if we are looking at avoiding the
above for() loop then we should have constant block size (4K ?). But in
your above statement, you are counting overhead as a percentage of
blocksize. So how do we handle this ?
Now with constant block size The write_blk gets complex because of there
won't be a 1:1 mapping between the block number we need to use in
tdb_database and the backing I/O manager.
> But that's not too bad, since now you just have to do is figure out
> which block(s) need to be saved, and then check to see if a record
> already exists in the tdb; if it does, the original value has been
> saved, and you don't have to do anything; if it doesn't then you just
> save the entire block. The undo manager need to save the first
> blocksize specified, and backup the original data in the tdb file in
> terms of that first blocksize, regardless of any later changes in the
> blocksize (which as I said is rare).
>
>> + if (req_loc < cur_loc) {
>> + if (req_loc+req_size > cur_loc) {
>> + /* ------------------
>> + * | | | |
>> + * ------------------
>> + * rl cl rs cs
>> + */
>
> This looks like notes to yourself; but it's not obvious what's going
> on here. Fortunately with the above suggested change in algorith,
> most of this will go away, but please make your comments more obvious.
>
rl -> req_loc
cl -> cur_loc
rs -> req_size
cs -> cur_size
>> + // I have libtdb1. Enable after rebasing to latest e2fsprogs that has this API
>> + //tdb_transaction_start(data->tdb);
>
> What version of e2fsprogs are you developing against?
>
Right now i am manually linking it against libtdb.
dpkg --search /usr/lib/libtdb.so
tdb-dev: /usr/lib/libtdb.so
>> + /* FIXME!! Not sure what should be the error */
>> + //tdb_transaction_cancel(data->tdb);
>> + retval = EXT2_ET_SHORT_WRITE;
>> + goto error_out;
>
> I would add a new error code in lib/ext2fs/ext2_err.et.in; that way we
> can have a very specific error code explaining that we have a failure
> to save the original version of the block.
>
Will do this
>> + actual = read(data->dev, tdb_data.dptr, size);
>> + if (actual != size) {
>
> Instead of using ext2fs_llseek() and read(), please use the underlying
> io_manager. That way the undo manager might have a chance to work on
> some other io manager other than unix_io. Yes, that means that you
> might in some cases need to save and restore the current blksize. But
> that's not going to be the common case, and it will make the code much
> more general.
>
>> +error_out:
>> + /* Move the location back */
>> + if (ext2fs_llseek(data->dev, cur_loc, SEEK_SET) != cur_loc) {
>> + retval = errno ? errno : EXT2_ET_LLSEEK_FAILED;
>> + goto error_out;
>> + }
>
> Why do you need to move the location back? As far as I know nothing
> should be depending on current offset of the file descriptor.
>
>
No specific reason.
-aneesh
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists