linux-ext4 - Re: [PATCH 3/3] ext4: fix block swap procedure on migration V2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20111029125451.GC19536@thunk.org>
Date:	Sat, 29 Oct 2011 08:54:51 -0400
From:	Ted Ts'o <tytso@....edu>
To:	Dmitry Monakhov <dmonakhov@...nvz.org>
Cc:	Andreas Dilger <adilger@...ger.ca>, linux-ext4@...r.kernel.org,
	aneesh.kumar@...ux.vnet.ibm.com
Subject: Re: [PATCH 3/3] ext4: fix block swap procedure on migration V2

On Mon, Sep 19, 2011 at 08:45:32PM +0400, Dmitry Monakhov wrote:
> > It would potentially be better to just leave the inode off the orphan
> > list, and in the _extremely_ rare case that there is a crash during
> > inode migration the inode is leaked until e2fsck is run.  The migrate
> > will happen at most once for any filesystem, so the loss of a single
> > inode per crash will not be a serious issue IMHO.
>
> But we still need to tell e2fsck that it is not just an average inode,
> and it should be cleaned up without touching it's data blocks. Otherwise
> fsck will complain about blocks are referenced by multiple inodes, which
> is really scary message. So IMHO we still need persistent flag. 

The simplest way to solve this is to keep the n_links count on the
inode to be zero.  That will prevent e2fsck from considering the inode
as being in use, so it will simply not consider the blocks owned by
the temporary inode.  What this would mean is that we will leak the
temporary inode as well as the newly allocated index blocks until the
next e2fsck, but that's not a disaster, and it is a very rare case.

If we wanted to optimize things further, we could also add support to
__ext4_get_inode_loc(), ext4_mark_inode_dirty(), and
ext4_write_inode() so that if the inode number is some magic number,
such as inode number 1 (which as the bad block inode we never touch
directly), that it is to be considered an in-memory inode only and so
we don't even bother writing it to do disk or journaling to it.  That
way the migration code can use an in-memory inode which is allocated
by ext4_ext_migrate(), and whose only existence is a pointer in that
kernel stack frame to an in-memory inode, which has no existence on
disk.

With this optimized approach, the only thing we will leak is one,
maybe two extent tree index blocks *if* we happen to be migrating an
inode during the time of a system crash.  We could even avoid that by
adding support for blocks which are allocated in memory, but not (yet)
pushed out to disk, which we may need for some of the write path
improvements.  But if we don't get to that right away, I think that's
fine....

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html