lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <E1KjlCM-0002L7-SO@closure.thunk.org>
Date:	Sat, 27 Sep 2008 21:35:46 -0400
From:	"Theodore Ts'o" <tytso@....edu>
To:	Alex Tomas <bzzz@....com>, Andreas Dilger <adilger@....com>
Cc:	linux-ext4@...r.kernel.org
Subject: Potential bug in mballoc --- reusing data blocks before txn commit

So while I was preparing a patch to delete the old legacy block
allocator, I came across this bit of code in balloc.c:

/**
 * ext4_test_allocatable()
 * @nr:                        given allocation block group
 * @bh:                        bufferhead contains the bitmap of the given bloc
 *
 * For ext4 allocations, we must not reuse any blocks which are
 * allocated in the bitmap buffer's "last committed data" copy.  This
 * prevents deletes from freeing up the page for reuse until we have
 * committed the delete transaction.
 *
 * If we didn't do this, then deleting something and reallocating it as
 * data would allow the old block to be overwritten before the
 * transaction committed (because we force data to disk before commit).
 * This would lead to corruption if we crashed between overwriting the
 * data and committing the delete.
 *
 * @@@ We may want to make this allocation behaviour conditional on
 * data-writes at some point, and disable it for metadata allocations or
 * sync-data inodes.
 */

This is done by searching an old copy of the bitmap found
jh->b_committed_data.

So I was more than a little bit concerned when I found that mballoc.c
wasn't referencing b_committed data *at* *all*.  When I looked a bit
more closely, it looks like that mballoc is using a separate scheme,
based on linked list hanging off of sbi->s_active_transaction.
Unfortunately, it seems to only prevent released metadata blocks from
being reused:

	if (metadata) {
		/* blocks being freed are metadata. these blocks shouldn't
		 * be used until this transaction is committed */
		ext4_mb_free_metadata(handle, &e4b, block_group, bit, count);
	} else {

This means that if a file is deleted, but then system crashes before the
commit, some of its data blocks could be reallocated and then written
out.  We could fix this by running all released blocks via
ext4_mb_free_metadata(), but since ext4_mb_free_metadata() stores blocks
that need to be protected via a block list, this could get *quite*
unwieldy, especially when deleting very large files (we could end up
with a very large linked list).  Alternate solutions would involve using
a list of extents, or the original jh->b_commited_data mechanism.

I'll also note that a linked list of extents that should be freed would
also be useful for implementing the trim command for SSD's --- and that
this would be much more cleanly implemented via a callback from the jbd2
layer when a commit is finished, rather than the current
ext4_mb_poll_new_transaction() mechanism.

In any case, is there a reason why the mballoc.c is using its current
scheme, and not using kj->b_commited_data as in the original balloc.c
code?  And was there a reason why you decided that it wasn't necessary
to protect freed data blocks from being reused until the transaction was
committed?

Regards,

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ