lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <g57h2fwia2mefxiawafzlgaximj6i75aj2blglciuxzsc65mve@kvnh7fl2cwpz>
Date: Thu, 20 Mar 2025 12:05:03 +0100
From: Jan Kara <jack@...e.cz>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: brauner@...nel.org, viro@...iv.linux.org.uk, jack@...e.cz, 
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [RFC PATCH] fs: call inode_sb_list_add() outside of inode hash
 lock

On Thu 20-03-25 01:46:43, Mateusz Guzik wrote:
> As both locks are highly contended during significant inode churn,
> holding the inode hash lock while waiting for the sb list lock
> exacerbates the problem.
> 
> Why moving it out is safe: the inode at hand still has I_NEW set and
> anyone who finds it through legitimate means waits for the bit to clear,
> by which time inode_sb_list_add() is guaranteed to have finished.
> 
> This significantly drops hash lock contention for me when stating 20
> separate trees in parallel, each with 1000 directories * 1000 files.
> 
> However, no speed up was observed as contention increased on the other
> locks, notably dentry LRU.
> 
> Even so, removal of the lock ordering will help making this faster
> later.
> 
> Signed-off-by: Mateusz Guzik <mjguzik@...il.com>

I agree I_NEW will be protecting the inode from being looked up through the
hash while we drop inode_hash_lock so this is safe. I'm usually a bit
reluctant to performance "improvements" that actually don't bring a
measurable benefit but this patch looks like a no-brainer (I'm not getting
worried about added complexity :)) and at least it reduces contention on
inode_hash_lock so I agree the potential is there. So feel free to add:

Reviewed-by: Jan Kara <jack@...e.cz>

								Honza

> ---
> 
> There were ideas about using bitlocks, which ran into trouble because of
> spinlocks being taken *after* and RT kernel not liking that.
> 
> I'm thinking thanks to RCU this very much can be hacked around to
> reverse the hash-specific lock -> inode lock: you find the inode you
> are looking for, lock it and only then take the bitlock and validate it
> remained in place.
> 
> The above patch removes an impediment -- the only other lock possibly
> taken with the hash thing.
> 
> Even if the above idea does not pan out, the patch has some value in
> decoupling these.
> 
> I am however not going to strongly argue for it given lack of results.
> 
>  fs/inode.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index e188bb1eb07a..8efd38c27321 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1307,8 +1307,8 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval,
>  	}
>  
>  	if (set && unlikely(set(inode, data))) {
> -		inode = NULL;
> -		goto unlock;
> +		spin_unlock(&inode_hash_lock);
> +		return NULL;
>  	}
>  
>  	/*
> @@ -1320,14 +1320,14 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval,
>  	hlist_add_head_rcu(&inode->i_hash, head);
>  	spin_unlock(&inode->i_lock);
>  
> +	spin_unlock(&inode_hash_lock);
> +
>  	/*
>  	 * Add inode to the sb list if it's not already. It has I_NEW at this
>  	 * point, so it should be safe to test i_sb_list locklessly.
>  	 */
>  	if (list_empty(&inode->i_sb_list))
>  		inode_sb_list_add(inode);
> -unlock:
> -	spin_unlock(&inode_hash_lock);
>  
>  	return inode;
>  }
> @@ -1456,8 +1456,8 @@ struct inode *iget_locked(struct super_block *sb, unsigned long ino)
>  			inode->i_state = I_NEW;
>  			hlist_add_head_rcu(&inode->i_hash, head);
>  			spin_unlock(&inode->i_lock);
> -			inode_sb_list_add(inode);
>  			spin_unlock(&inode_hash_lock);
> +			inode_sb_list_add(inode);
>  
>  			/* Return the locked inode with I_NEW set, the
>  			 * caller is responsible for filling in the contents
> -- 
> 2.43.0
> 
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ