linux-ext4 - Re: Odd "leak" of extent info into data blocks?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Wed, 9 Sep 2009 11:19:11 -0400
From:	Theodore Tso <tytso@....edu>
To:	Curt Wohlgemuth <curtw@...gle.com>
Cc:	Valerie Aurora <vaurora@...hat.com>,
	ext4 development <linux-ext4@...r.kernel.org>
Subject: Re: Odd "leak" of extent info into data blocks?

On Tue, Sep 08, 2009 at 09:00:50PM -0700, Curt Wohlgemuth wrote:
> 
> > In ext3 and ext4, metadata blocks (such as
> > extent tree blocks), aren't stored in the page cache.
> 
> Hmm.  You're saying that in the absence of a journal, all metadata
> writes go direct to disk?  Where should I look for this in the code?

Sorry, let me be more precise.  All metadata writes, regardless of
whether a journal is present or not, are written via the buffer head
(bh) abstraction.  They have to, because that's how we do our
journalling; the jbd/jbd2 layer is built on top of the bh I/O request
layer, and even when a journal is not present, we are still doing our
metadata I/O via the submit_bh and ll_rw_block interface.

It used to be the case (in Linux 2.4) that the buffer cache was stored
separately from the page cache.  In Linux 2.6, the buffer cache is
implemented on top of the page cache, so technically, the metadata
blocks are stored in the page cache; however, they are only *accessed*
via the buffer cache abstraction.

> The problem is that I've seen this in real life.  And the patch below
> seems to fix it.  (Unfortunately, I haven't been able to recreate this
> in a simple example, after several days work.  I've only seen this in
> a *very* small number of cases on heavily loaded machines.)

I believe that you have a problem.  The problem is you have a dirty bh
which is getting written out after the block gets reallocated for use
as a data block.  But a bforget() call should have the problem just as
as well.  In fact, I think the real fix should be this.

commit 1b58b00e02893b4bbab2b5f137316b82feadac52
Author: Theodore Ts'o <tytso@....edu>
Date:   Wed Sep 9 11:18:42 2009 -0400

    ext4: Use bforget() in no journal mode when in ext4_journal_forget()

    When ext4 is using a journal, a metadata block which is deallocated
    must be passed into the journal layer so it can be "revoked".  The
    jbd2_journal_forget() function is also responsible for calling
    bforget().  Without a journal, ext4_journal_forget() must call
    bforget(), to avoid a race from a dirty metadata block getting written
    back after it has been reallocated and reused for another inode's data
    block.

    Signed-off-by: "Theodore Ts'o" <tytso@....edu>

diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index eb27fd0..d4f4b39 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -44,7 +44,7 @@ int __ext4_journal_forget(const char *where, handle_t *handle,
 						  handle, err);
 	}
 	else
-		brelse(bh);
+		bforget(bh);
 	return err;
 }

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html