lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191016202648.GG11103@mit.edu>
Date:   Wed, 16 Oct 2019 16:26:48 -0400
From:   "Theodore Y. Ts'o" <tytso@....edu>
To:     Harshad Shirwadkar <harshadshirwadkar@...il.com>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: [PATCH v3 07/13] ext4: track changed files for fast commit

On Tue, Oct 01, 2019 at 12:40:56AM -0700, Harshad Shirwadkar wrote:
> +void ext4_fc_enqueue_inode(handle_t *handle, struct inode *inode)
> +{
> +	struct ext4_inode_info *ei = EXT4_I(inode);
> +	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> +	tid_t running_txn_tid = get_running_txn_tid(inode->i_sb);

BTW, we don't actually have to call get_running_txn_tid() here.  We
have a handle, which means we know there is a running transaction, so
we can also just do:

	tid_t running_txn_tid = handle->h_transaction->t_id;

> +
> +	if (!ext4_should_fast_commit(inode->i_sb))
> +		return;
> +
> +	spin_lock(&sbi->s_fc_lock);

This is going to be a major lock contention bottleneck.  So we should
move the the write_lock of &ei->i_fc.fc_lock and comparison of
ei->i_fc.fc_tid against running_txn_tid before we try to take the file
system-level s_fc_lock.

> +	if (!sbi->s_fc_eligible) {
> +		spin_unlock(&sbi->s_fc_lock);
> +		return;
> +	}

I'm really not fond the file system level s_fc_eligible; again, I
really think we should have a transaction-level "this transaction is
not eligible for fast commit" flag.  We don't have to be super careful
about locking for this flag anyway, since it only transitions from set
to unset, and here in ext4_fc_enqueue_inode(), it's only an
optimization to avoid doing extra unnecessary work.

> +static inline void
> +ext4_fc_mark_ineligible(struct inode *inode)
> +{
> +	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> +	struct ext4_inode_info *ei = EXT4_I(inode);
> +
> +	write_lock(&ei->i_fc.fc_lock);
> +	if (sbi->s_journal)
> +		ei->i_fc.fc_tid = sbi->s_journal->j_commit_sequence + 1;

Use get_running_txn_tid() instead?

> +	ei->i_fc.fc_eligible = false;
> +	write_unlock(&ei->i_fc.fc_lock);
> +	spin_lock(&sbi->s_fc_lock);
> +	sbi->s_fc_eligible = false;
> +	spin_unlock(&sbi->s_fc_lock);
> +}
> +

> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index f230a888eddd..6d2efbd9aba9 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -279,6 +280,8 @@ void ext4_evict_inode(struct inode *inode)
>  	if (ext4_inode_is_fast_symlink(inode))
>  		memset(EXT4_I(inode)->i_data, 0, sizeof(EXT4_I(inode)->i_data));
>  	inode->i_size = 0;
> +	ext4_fc_del(inode);
> +	ext4_fc_mark_ineligible(inode);

Why is ext4_fc_mark_ineligible() needed here?

> @@ -326,6 +330,8 @@ void ext4_evict_inode(struct inode *inode)
>  	 * having errors), but we can't free the inode if the mark_dirty
>  	 * fails.
>  	 */
> +	ext4_fc_del(inode);
> +	ext4_fc_mark_ineligible(inode);

Same question here....

> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index 442f7ef873fc..a8e23acb5c03 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -987,6 +987,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		err = mnt_want_write_file(filp);
>  		if (err)
>  			return err;
> +		ext4_fc_mark_sb_ineligible(sb);
>  		err = swap_inode_boot_loader(sb, inode);
>  		mnt_drop_write_file(filp);
>  		return err;

I don't think we need to mark the whole file system (transaction) as
ineligible.  We just have to mark the two inodes being marked as
ineligible, no?

> @@ -997,6 +998,8 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		int err = 0, err2 = 0;
>  		ext4_group_t o_group = EXT4_SB(sb)->s_groups_count;
>  
> +		ext4_fc_mark_sb_ineligible(sb);
> +
>  		if (copy_from_user(&n_blocks_count, (__u64 __user *)arg,
>  				   sizeof(__u64))) {
>  			return -EFAULT;

This is the resize ioctl, and this is the one place where we need to
mark the whole transaction as fc ineligible, since some other
subsequent handle might try to allocate blocks or inodes that were
created as the result of EXT4_IOC_RESIZE_FS.

But we shouldn't actually do it here; we should do it whenever we
start a handle that tries to resize the file system, since it is
*that* transaction that we need to make sure is made ineligible.
Otherwise there can be races where we set the flag in sbi, but before
we have a chance to start the handle which does (part of) the resize
operation, it gets cleared because another transaction committed
first.

We similarly need to mark the transaction is ineligible for any
handles created as the result of EXT4_IOC_GROUP_ADD and
EXT4_IOC_GROUP_EXTEND.  (Which are the old/legacy resize ioctl.)

> diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
> index b1e4d359f73b..b995690d73ce 100644
> --- a/fs/ext4/migrate.c
> +++ b/fs/ext4/migrate.c
> @@ -513,6 +513,7 @@ int ext4_ext_migrate(struct inode *inode)
>  		 * work to orphan_list_cleanup()
>  		 */
>  		ext4_orphan_del(NULL, tmp_inode);
> +		ext4_fc_del(inode);

This should be tmp_inode, not inode; and I don't think it's needed,
since the tmp inode will never have been fast commit enqueued.

> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index 491f9ee4040e..19bc4046658c 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -1406,6 +1406,7 @@ static int ext4_xattr_inode_write(handle_t *handle, struct inode *ea_inode,
>  	inode_unlock(ea_inode);
>  
>  	ext4_mark_inode_dirty(handle, ea_inode);
> +	ext4_fc_enqueue_inode(handle, ea_inode);

If we modify an external xattr block, or if we need to create (or
modify the ref count) on an EA inode, we need to disable fast commit
on the inode whose xattrs we are manipulating.  Could you add that
logic, please?

We could add support for writing out the external xattr block to the
fast commit log if it has been modified, but that's a fast commit
change in its journal format.

					- Ted

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ