linux-ext4 - Re: [PATCH] jbd2: Fix a race between checkpointing code and journal_get_write

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090710100713.GB17524@duck.suse.cz>
Date:	Fri, 10 Jul 2009 12:07:13 +0200
From:	Jan Kara <jack@...e.cz>
To:	Theodore Tso <tytso@....edu>
Cc:	linux-ext4@...r.kernel.org
Subject: Re: [PATCH] jbd2: Fix a race between checkpointing code and
	journal_get_write_access()

On Wed 08-07-09 18:31:50, Theodore Tso wrote:
> On Sun, Jul 05, 2009 at 10:53:19PM -0400, Theodore Tso wrote:
> > On Wed, Jun 24, 2009 at 06:02:40PM +0200, Jan Kara wrote:
> > > The following race can happen:
> > > 
> > >   CPU1                          CPU2
> > >                                 checkpointing code checks the buffer, adds
> > >                                   it to an array for writeback
> > > do_get_write_access()
> > >   ...
> > >   lock_buffer()
> > >   unlock_buffer()
> > >                                   flush_batch() submits the buffer for IO
> > >   __jbd2_journal_file_buffer()
> > > 
> > >   So a buffer under writeout is returned from do_get_write_access(). Since
> > > the filesystem code relies on the fact that journaled buffers cannot be
> > > written out, it does not take the buffer lock and so it can modify buffer
> > > while it is under writeout. That can lead to a filesystem corruption
> > > if we crash at the right moment.
> > >   We fix the problem by clearing the buffer dirty bit under buffer_lock
> > > even if the buffer is on BJ_None list. Actually, we clear the dirty bit
> > > regardless the list the buffer is in and warn about the fact if
> > > the buffer is already journalled.
> 
> When running fsstress, we get the "Spotted dirty metadata buffer;
> there's a risk of filesystem corruption in csae of a system crash" at
> least half a dozen times or so.  That sounds like we have a problem.
> Were you expecting that this was a "this should never happen"
> situation, or is there a known bug that we need to fix here?
  Yes, it should be "this should never happen", unless you run something
like tune2fs while the filesystem is mounted. But looking at the code, I
have missed that buffer could be dirty also in
jbd2_journal_get_create_access() because jbd2_journal_forget() does not
clear the dirty bit in case the buffer is just being committed. Attached
patch should fix it. Thanks for report.

								Honza

-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR

View attachment "0001-jbd2-Clear-dirty-bit-in-jbd2_journal_get_create_acc.patch" of type "text/x-patch" (1594 bytes)