linux-kernel - Re: [PATCH 0/4] Fix filesystem freezing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20120113110759.GB13641@quack.suse.cz>
Date:	Fri, 13 Jan 2012 12:07:59 +0100
From:	Jan Kara <jack@...e.cz>
To:	Dave Chinner <david@...morbit.com>
Cc:	Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org,
	LKML <linux-kernel@...r.kernel.org>, linux-ext4@...r.kernel.org,
	xfs@....sgi.com, Eric Sandeen <sandeen@...deen.net>,
	Dave Chinner <dchinner@...hat.com>,
	Surbhi Palande <csurbhi@...il.com>,
	Kamal Mostafa <kamal@...onical.com>,
	Christoph Hellwig <hch@...radead.org>
Subject: Re: [PATCH 0/4] Fix filesystem freezing

On Fri 13-01-12 11:09:32, Dave Chinner wrote:
> On Thu, Jan 12, 2012 at 12:30:31PM +0100, Jan Kara wrote:
> > On Thu 12-01-12 13:48:41, Dave Chinner wrote:
> > > On Thu, Jan 12, 2012 at 02:20:49AM +0100, Jan Kara wrote:
> > > > 
> > > >   Hello,
> > > > 
> > > >   filesystem freezing is currently racy and thus we can end up with dirty data
> > > > on frozen filesystem (see changelog of the first patch for detailed race
> > > > description and proposed fix). This patch series aims at fixing this.
> > > 
> > > It only fixes the dirty data race (i.e. SB_FREEZE_WRITE). The same
> > > race conditions exist for SB_FREEZE_TRANS on XFS, and so need the
> > > same fix. That race has had one previous attempt at fixing it in
> > > XFS but that's not possible:
> > > 
> > > b2ce397 Revert "xfs: fix filesystsem freeze race in xfs_trans_alloc"
> > > 7a249cf xfs: fix filesystsem freeze race in xfs_trans_alloc
> > > 
> > > It was looking at that problem earlier today that lead to the
> > > solution Eric proposed. Essentially the method in these patches
> > > needs to replace the xfs specifc m_active_trans counter and delay
> > > during ->fs_freeze to prevent that race condition....
> >   OK, I see. I just checked ext4 to make sure and ext4 seems to get this
> > right. Looking into Christoph's original patch it shouldn't be hard to fix
> > it. Instead of:
> >         atomic_inc(&mp->m_active_trans);
> >  
> >         if (wait_for_freeze)
> >               xfs_wait_for_freeze(mp, SB_FREEZE_TRANS);
> > 
> > we just need to do a bit more elaborate
> > 
> > retry:
> >         if (wait_for_freeze)
> >               xfs_wait_for_freeze(mp, SB_FREEZE_TRANS);
> >         atomic_inc(&mp->m_active_trans);
> > 	if (wait_for_freeze && mp->m_super->s_frozen >= SB_FREEZE_TRANS) {
> >         	atomic_dec(&mp->m_active_trans);
> > 		goto retry;
> > 	}
> > 
> > Or does XFS support nested transactions (i.e. a thread already holding a
> > running transaction can call into xfs_trans_alloc() again)?
> > That would make things more complicated...
> 
> You're still missing the point - that this isn't an XFS specific
> problem or that the write problem is a ext4 specific problem. The
> problem is that these are freeze state transition problems -
> something that can affect every filesystem because the freeze code
> is generic.  Quite frankly, I'm not interested in having a generic
> solution for SB_FREEZE_WRITE and a custom, per filesystem solution
> for SB_FREEZE_TRANS when the solution is exactly the same.
  I understand that both state transitions are currently racy. Just ext3,
ext4, reiserfs, gfs2, or btrfs do not really care about SB_FREEZE_TRANS
transition because they all grew their own synchronization mechanisms for
that. XFS is the only filesystem I know of which really relies on this
transition. That's why I originally decided to fixup SB_FREEZE_TRANS
transition only in XFS and not in VFS. But on a second thought I tend to
agree with you that VFS should provide a way to do race-free transition to
both states so that filesystems that want to use it can use it. So I'll add
a second counter for that.
 
> > Using sb_start_write() instead of m_active_trans won't be that easy because
> > it can create A-A deadlocks (e.g. we do sb_start_write in
> > block_page_mkwrite() and then xfs_get_blocks() decides to start a
> > transaction and calls sb_start_write() again which might block if
> > filesystem freezing started in the mean time).
> 
> So, like Eric said in his first email, it's not a "write start/end"
> interface that is needed, the interface has to work with different
> freeze levels (e.g "sb_freeze_ref(sb, level)/sb_freeze_drop(sb,
> level)").  Sure, internally it would have to map to two counters and
> different level checks, but it solves the same problem for all
> levels of freeze for all filesystems.
> 
> Let's fix this freeze problem once and for all in the generic code,
> and not have to keep coming back to it to add more functioanlity for
> different situations the most recent fix didn't handle for random
> filesystem X....
  Yeah. I think ext3/4 could be converted to the generic mechanism
(although it won't be completely trivial since it uses the internal
mechanism also for other things than filesystem freezing).
								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/