lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20091013201816.GF31440@duck.suse.cz>
Date:	Tue, 13 Oct 2009 22:18:16 +0200
From:	Jan Kara <jack@...e.cz>
To:	Chris Mason <chris.mason@...cle.com>
Cc:	Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-ext4@...r.kernel.org,
	jbacik@...hat.com
Subject: Re: Fun with fdatasync()

On Tue 13-10-09 16:01:15, Chris Mason wrote:
> On Tue, Oct 13, 2009 at 09:23:57PM +0200, Jan Kara wrote:
> > On Tue 13-10-09 12:00:28, Chris Mason wrote:
> > > On Tue, Oct 13, 2009 at 12:00:43AM +0200, Jan Kara wrote:
> > > >   Hi,
> > > > 
> > > > On Mon 12-10-09 10:00:49, Chris Mason wrote:
> > > 
> > > [ clearing of I_DIRTY_DATASYNC by pdflush ]
> > > 
> > > > > 
> > > > > Am I missing something?  I don't see how fdatasync is safe in our
> > > > > current usage.
> > > >   Yeah, we already discussed similar problems I_DIRTY flags with Ted and
> > > > others in thread "fsync on ext[34] working only by an accident" on
> > > > linux-ext4.
> > > >   I don't quite like clearing dirty flags only on sync - pdflush would then
> > > > unnecessarily try to get rid of those inodes and burn CPU on them.
> > > > Actually, mapping->private_list (and bh->b_assoc_buffers) is meant to be
> > > > used exactly for the purpose of tracking what needs to be written on fsync
> > > > so my current plan is to somehow utilize that list to fix the problem.
> > > >   Maybe I even get to that tomorrow ;) Thanks for the reminder.
> > > 
> > > I honestly don't remember all the details now, but I know that when
> > > reiserfs stopped using the b_assoc_buffers stuff life got much less
> > > complex.  From an outsider's point of view the last thing jbd needs is
> > > another list of buffers to live on.
> > > 
> > > It seems like ext34 need to be able to answer 3 questions during an
> > > fsync or fdatasync:
> > > 
> > > The last transaction to change this file (fill hole, change
> > > i_size)
> > > 
> > > The last transaction to log this inode (for full fsync)
> > > 
> > > The last transaction committed such that fsync would consider it done.
> > > 
> > > Filling holes and changing i_size only happens from a handful of places,
> > > so it would be easy to update a transid field in the in-memory inode for
> > > that.
> > > 
> > > The inode logging code could bump a second transid field to catch all
> > > the other ways inodes change.
> > > 
> > > The transaction code could (or already does?) export an easy way to
> > > check the last commit.  Put the three together and you can safely jump
> > > out of fsync or fdatasync based on what the inode really needs instead
> > > of guessing with the I_ flags or page dirty bits.
> >   I was thinking about this solution as well (before thinking about
> > using b_assoc_queue). But to make this reliable we have to pin the inode
> > in memory until transactions modifying it (or its data) are committed.
> > Otherwise it could be reclaimed and we'd loose the information about
> > which transactions we have to wait for. And when we pin the inode, we'd
> > have to unpin it at transaction commit which implies we have to somehow
> > attach the inode to the transaction... That doesn't look more appealing
> > than b_assoc_queue in my opinion (OK, for ext4/JBD2 we have a way to
> > attach inodes to transactions but ext3/JBD does not have such way).
> 
> Most of the time the file is open from the time it is written to the
> time it is sunk.  For the cases where we lose the inode, we just find a
> zero in the last trans field and error on the side of doing the full
> fsync.  This will be a pretty rare event ;)
  You're probably right that fsyncing an inode that has been freshly loaded
from disk is rare enough so that we can take the penalty of unnecessary
transaction commit in such cases. OK, I'll implement it like that.  It
should be pretty straightforward. Thanks for the discussion.

								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ