linux-kernel - Re: [PATCH][RFC]fix soft lock up at NFS mount by per-SB LRU-list of unused dentries

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 23 May 2008 08:56:08 +1000
From:	David Chinner <dgc@....com>
To:	Kentaro Makita <k-makita@...css.fujitsu.com>
Cc:	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	akpm@...ux-foundation.org, dgc@....com, viro@...IV.linux.org.uk
Subject: Re: [PATCH][RFC]fix soft lock up at NFS mount by per-SB LRU-list of unused dentries

On Thu, May 22, 2008 at 11:22:18AM +0900, Kentaro Makita wrote:
> +/*
> + * Shrink the dentry LRU on a given superblock.
> + * @sb   : superblock to shrink dentry LRU.
> + * @count: If count is NULL, we prune all dentries on superblock.
> + * @flags: If flags is non-zero, we need to do special processing based on
> + * which flags are set. This means we don't need to maintain multiple
> + * similar copies of this loop.
> + */
> +static void __shrink_dcache_sb(struct super_block *sb, int *count, int flags)
> +{
> +	LIST_HEAD(referenced);
> +	LIST_HEAD(tmp);
> +	struct dentry *dentry;
> +	int cnt = 0;
> +
> +	BUG_ON(!sb);
> +	BUG_ON((flags & DCACHE_REFERENCED) && count == NULL);
> +	spin_lock(&dcache_lock);
> +	if (count != NULL)
> +		/* called from prune_dcache() and shrink_dcache_parent() */
> +		cnt = *count;

I'd convert all the 'if (count == NULL)' to 'if (!count)' and same
for 'if (count != NULL)' to 'if (count)'....

> +restart:
> +	if (count == NULL)
> +		list_splice_init(&sb->s_dentry_lru, &tmp);
> +	else {

Probably should add {} to the first branch here as well.

> +		while (!list_empty(&sb->s_dentry_lru)) {
> +			dentry = list_entry(sb->s_dentry_lru.prev,
> +					struct dentry, d_lru);
> +			BUG_ON(dentry->d_sb != sb);
> +
> +			spin_lock(&dentry->d_lock);
> +		/*
> +		 * If we are honouring the DCACHE_REFERENCED flag and the
> +		 * dentry has this flag set, don't free it. Clear the flag
> +		 * and put it back on the LRU
> +		 */
> +			if ((flags & DCACHE_REFERENCED)
> +				&& (dentry->d_flags & DCACHE_REFERENCED)) {

Indentation of the second line of the if is the same as the block of
code underneath it. Probably should read:

			if ((flags & DCACHE_REFERENCED) &&
			    (dentry->d_flags & DCACHE_REFERENCED)) {
> +				dentry->d_flags &= ~DCACHE_REFERENCED;
> +				list_move_tail(&dentry->d_lru, &referenced);
> +				spin_unlock(&dentry->d_lock);
> +			} else {
> +				list_move_tail(&dentry->d_lru, &tmp);
> +				spin_unlock(&dentry->d_lock);
> +				cnt--;
> +				if (!cnt)
> +					break;
				if (--cnt == 0)
					break;
> +			}
> +		}
> +	}

I'm wondering if this loop is an excessively long time to be holding the
dcache_lock. I guess the hol dtime is limited by the size of *count being
passed in. I think we could also do a:

		cond_resched_lock(&dcache_lock);
	
in the loop here to prevent this from occurring....

> +	while (!list_empty(&tmp)) {
> +		dentry = list_entry(tmp.prev, struct dentry, d_lru);
> +		dentry_lru_del_init(dentry);
> +		spin_lock(&dentry->d_lock);
> +		/*
> +		 * We found an inuse dentry which was not removed from
> +		 * the LRU because of laziness during lookup.  Do not free
> +		 * it - just keep it off the LRU list.
> +		 */
> +		if (atomic_read(&dentry->d_count)) {
> +			spin_unlock(&dentry->d_lock);
> +			continue;
> +		}
> +		prune_one_dentry(dentry);
> +		/* dentry->d_lock is dropped in prune_one_dentry() */
> +		cond_resched_lock(&dcache_lock);
> +	}
> +	if ((count == NULL) && (!list_empty(&sb->s_dentry_lru)))
> +		goto restart;
> +	if (count != NULL)
> +		*count = cnt;

	if (count)
		*count = cnt;
	else if (!list_empty(&sb->s_dentry_lru))
		goto restart;

> +	if (!list_empty(&referenced))
> +		list_splice(&referenced, &sb->s_dentry_lru);
> +	spin_unlock(&dcache_lock);
> +}
> +
>  /**
>   * prune_dcache - shrink the dcache
>   * @count: number of entries to try and free
> - * @sb: if given, ignore dentries for other superblocks
> - *         which are being unmounted.
>   *
>   * Shrink the dcache. This is done when we need
>   * more memory, or simply when we need to unmount
> @@ -425,111 +526,75 @@ static void prune_one_dentry(struct dent
>   * all the dentries are in use.
>   */
> 
> -static void prune_dcache(int count, struct super_block *sb)
> +static void prune_dcache(int count)
>  {
> -	spin_lock(&dcache_lock);
> -	for (; count ; count--) {
> -		struct dentry *dentry;
> -		struct list_head *tmp;
> -		struct rw_semaphore *s_umount;
> +	struct super_block *sb;
> +	int w_count;
> +	int unused = dentry_stat.nr_unused;
> +	int prune_ratio;
> +	int pruned;
> 
> -		cond_resched_lock(&dcache_lock);
> -
> -		tmp = dentry_unused.prev;
> -		if (sb) {
> -			/* Try to find a dentry for this sb, but don't try
> -			 * too hard, if they aren't near the tail they will
> -			 * be moved down again soon
> -			 */
> -			int skip = count;
> -			while (skip && tmp != &dentry_unused &&
> -			    list_entry(tmp, struct dentry, d_lru)->d_sb != sb) {
> -				skip--;
> -				tmp = tmp->prev;
> -			}
> -		}
> -		if (tmp == &dentry_unused)
> -			break;
> -		list_del_init(tmp);
> -		prefetch(dentry_unused.prev);
> - 		dentry_stat.nr_unused--;
> -		dentry = list_entry(tmp, struct dentry, d_lru);
> -
> - 		spin_lock(&dentry->d_lock);
> -		/*
> -		 * We found an inuse dentry which was not removed from
> -		 * dentry_unused because of laziness during lookup.  Do not free
> -		 * it - just keep it off the dentry_unused list.
> -		 */
> - 		if (atomic_read(&dentry->d_count)) {
> - 			spin_unlock(&dentry->d_lock);
> -			continue;
> -		}
> -		/* If the dentry was recently referenced, don't free it. */
> -		if (dentry->d_flags & DCACHE_REFERENCED) {
> -			dentry->d_flags &= ~DCACHE_REFERENCED;
> - 			list_add(&dentry->d_lru, &dentry_unused);
> - 			dentry_stat.nr_unused++;
> - 			spin_unlock(&dentry->d_lock);
> +	if (unused == 0 || count == 0)
> +		return;
> +	spin_lock(&dcache_lock);
> +restart:
> +	if (count >= unused)
> +		prune_ratio = 1;
> +	else
> +		prune_ratio = unused / count;

unused = 100,000, count = 128 => prune_ratio = 781.

if we have 3 filesystems with 70,000, 25,000 and 5,000 dentries a piece,
that gives us:

> +	spin_lock(&sb_lock);
> +	list_for_each_entry(sb, &super_blocks, s_list) {

Question on lock ordering of sb_lock vs dcache_lock - which is the inner
lock? Are the two of them held together anywhere else? (/me doesn't
have time to searchthe code right now).

> +		if (sb->s_nr_dentry_unused == 0)
>  			continue;
> -		}
> -		/*
> -		 * If the dentry is not DCACHED_REFERENCED, it is time
> -		 * to remove it from the dcache, provided the super block is
> -		 * NULL (which means we are trying to reclaim memory)
> -		 * or this dentry belongs to the same super block that
> -		 * we want to shrink.
> +		sb->s_count++;
> +		/* Now, we reclaim unused dentrins with fairness.
> +		 * we reclaim them same percentage on each superblocks.
> +		 * we calculate number of dentries to scan on this sb
> +		 * based on following way, but impelementation is arranged
> +		 * to avoid overflow.
> +		 * number of dentries to scan on this sb =
> +		 * count * (number of dentries on this sb /
> +		 * number of dentries in the machine)
>  		 */
> +		spin_unlock(&sb_lock);
> +		if (prune_ratio != 1)
> +			w_count = (sb->s_nr_dentry_unused / prune_ratio) + 1;
> +		else
> +			w_count = sb->s_nr_dentry_unused;
> +		pruned = w_count;

70,000 -> 90
25,000 -> 33
 5,000 -> 7

Total = 130. So we will prune a small number more than is asked for. I guess
this isn't a big problem because we exit the loop if count <= 0.

Otherwise it's looking good.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/