lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 27 Nov 2014 10:35:37 +1100
From:	Dave Chinner <david@...morbit.com>
To:	Andreas Dilger <adilger@...ger.ca>
Cc:	Theodore Ts'o <tytso@....edu>,
	Linux Filesystem Development List 
	<linux-fsdevel@...r.kernel.org>,
	Ext4 Developers List <linux-ext4@...r.kernel.org>,
	Linux btrfs Developers List <linux-btrfs@...r.kernel.org>,
	XFS Developers <xfs@....sgi.com>
Subject: Re: [PATCH-v4 6/7] ext4: add support for a lazytime mount option

On Wed, Nov 26, 2014 at 04:10:44PM -0700, Andreas Dilger wrote:
> On Nov 26, 2014, at 3:48 PM, Dave Chinner <david@...morbit.com> wrote:
> > 
> > On Wed, Nov 26, 2014 at 05:23:56AM -0500, Theodore Ts'o wrote:
> >> Add an optimization for the MS_LAZYTIME mount option so that we will
> >> opportunistically write out any inodes with the I_DIRTY_TIME flag set
> >> in a particular inode table block when we need to update some inode
> >> in that inode table block anyway.
> >> 
> >> Also add some temporary code so that we can set the lazytime mount
> >> option without needing a modified /sbin/mount program which can set
> >> MS_LAZYTIME.  We can eventually make this go away once util-linux has
> >> added support.
> >> 
> >> Google-Bug-Id: 18297052
> >> 
> >> Signed-off-by: Theodore Ts'o <tytso@....edu>
> >> ---
> >> fs/ext4/inode.c             | 49 ++++++++++++++++++++++++++++++++++++++++++---
> >> fs/ext4/super.c             |  9 +++++++++
> >> include/trace/events/ext4.h | 30 +++++++++++++++++++++++++++
> >> 3 files changed, 85 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> >> index 5653fa4..8308c82 100644
> >> --- a/fs/ext4/inode.c
> >> +++ b/fs/ext4/inode.c
> >> @@ -4140,6 +4140,51 @@ static int ext4_inode_blocks_set(handle_t *handle,
> >> }
> >> 
> >> /*
> >> + * Opportunistically update the other time fields for other inodes in
> >> + * the same inode table block.
> >> + */
> >> +static void ext4_update_other_inodes_time(struct super_block *sb,
> >> +					  unsigned long orig_ino, char *buf)
> >> +{
> >> +	struct ext4_inode_info	*ei;
> >> +	struct ext4_inode	*raw_inode;
> >> +	unsigned long		ino;
> >> +	struct inode		*inode;
> >> +	int		i, inodes_per_block = EXT4_SB(sb)->s_inodes_per_block;
> >> +	int		inode_size = EXT4_INODE_SIZE(sb);
> >> +
> >> +	ino = orig_ino & ~(inodes_per_block - 1);
> >> +	for (i = 0; i < inodes_per_block; i++, ino++, buf += inode_size) {
> >> +		if (ino == orig_ino)
> >> +			continue;
> >> +		inode = find_active_inode_nowait(sb, ino);
> >> +		if (!inode ||
> >> +		    (inode->i_state & I_DIRTY_TIME) == 0 ||
> >> +		    !spin_trylock(&inode->i_lock)) {
> >> +			iput(inode);
> >> +			continue;
> >> +		}
> >> +		inode->i_state &= ~I_DIRTY_TIME;
> >> +		inode->i_ts_dirty_day = 0;
> >> +		spin_unlock(&inode->i_lock);
> >> +		inode_requeue_dirtytime(inode);
> >> +
> >> +		ei = EXT4_I(inode);
> >> +		raw_inode = (struct ext4_inode *) buf;
> >> +
> >> +		spin_lock(&ei->i_raw_lock);
> >> +		EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
> >> +		EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
> >> +		EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
> >> +		ext4_inode_csum_set(inode, raw_inode, ei);
> >> +		spin_unlock(&ei->i_raw_lock);
> >> +		trace_ext4_other_inode_update_time(inode, orig_ino);
> >> +		iput(inode);
> >> +	}
> >> +}
> > 
> > Am I right in that this now does unlogged timestamp updates of
> > inodes? What happens when that buffer gets overwritten by log
> > recover after a crash? The timestamp updates get lost?
> > 
> > FYI, XFS has had all sorts of nasty log recovery corner cases
> > caused by log recovery overwriting non-logged inode updates like
> > this. In the past few years we've removed every single non-logged
> > inode update "optimisation" so that all changes (including timestamps)
> > are transactional so inode state on disk not matching what log
> > recovery wrote to disk for all the other inode metadata...
> > 
> > Optimistic unlogged inode updates are a slippery slope, and history
> > tells me that it doesn't lead to a nice place....
> 
> Since ext4/jbd2 is logging the whole block, unlike XFS which is doing
> logical journaling, this isn't an unlogged update.  It is just taking
> advantage of the fact that the whole block is going to be logged and
> written to the disk anyway.

Urk - that's worse, isn't it? i.e the code above calls iput() from
within a current transaction context?  What happens if that drops
the last reference to the inode and it gets evicted due to racing
with an unlink? Won't that try to start another transaction to free
the inode (i.e. through ext4_evict_inode())?



>
> If the only update needed for other inodes
> in the block is the timestamp then they may as well be flushed to disk
> at the same time and avoid the need for another update later on.
> 
> Cheers, Andreas
> 
> 
> 
> 
> 
> 

-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists