lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 4 Oct 2023 15:56:19 -0400
From:   Eric Whitney <enwlinux@...il.com>
To:     Jan Kara <jack@...e.cz>
Cc:     Eric Whitney <enwlinux@...il.com>, linux-ext4@...r.kernel.org,
        libaokun1@...wei.com, linux-fsdevel@...r.kernel.org
Subject: Re: probable quota bug introduced in 6.6-rc1

* Jan Kara <jack@...e.cz>:
> On Tue 03-10-23 20:11:11, Eric Whitney wrote:
> > When run on my test hardware, generic/270 triggers hung task timeouts when
> > run on a 6.6-rc1 (or -rc2, -rc3, -rc4) kernel with kvm-xfstests using the
> > nojournal test scenario.  The test always passes, but about 60% of the time
> > the running time of the test increases by an order of magnitude or more and
> > one or more of the hung task timeout warnings included below can be found in
> > the log.
> > 
> > This does not reproduce on 6.5.  Bisection leads to this patch:
> > 
> > dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should
> > provide")
> 
> Thanks for report! Indeed I can reproduce this. Attached patch fixes the
> problem for me, I'll queue it up in my tree once it passes some more
> testing.
> 
> 								Honza
> -- 
> Jan Kara <jack@...e.com>
> SUSE Labs, CR

Hi Jan:

Thanks very much for the quick work - I ran 100 trials of generic/270 on
the nojournal test scenario using a patched 6.6-rc1 kernel and didn't see any
hung task timeouts.  The elapsed test runtimes are comparable to those I get
when running on 6.5 using the same test hardware. So, your patch works for me.

Eric

> From cc557f91af0a970e731e3dc945a431271e59ce8c Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@...e.cz>
> Date: Wed, 4 Oct 2023 15:32:01 +0200
> Subject: [PATCH] quota: Fix slow quotaoff
> 
> Eric has reported that commit dabc8b207566 ("quota: fix dqput() to
> follow the guarantees dquot_srcu should provide") heavily increases
> runtime of generic/270 xfstest for ext4 in nojournal mode. The reason
> for this is that ext4 in nojournal mode leaves dquots dirty until the last
> dqput() and thus the cleanup done in quota_release_workfn() has to write
> them all. Due to the way quota_release_workfn() is written this results
> in synchronize_srcu() call for each dirty dquot which makes the dquot
> cleanup when turning quotas off extremely slow.
> 
> To be able to avoid synchronize_srcu() for each dirty dquot we need to
> rework how we track dquots to be cleaned up. Instead of keeping the last
> dquot reference while it is on releasing_dquots list, we drop it right
> away and mark the dquot with new DQ_RELEASING_B bit instead. This way we
> can we can remove dquot from releasing_dquots list when new reference to
> it is acquired and thus there's no need to call synchronize_srcu() each
> time we drop dq_list_lock.
> 
> References: https://lore.kernel.org/all/ZRytn6CxFK2oECUt@debian-BULLSEYE-live-builder-AMD64
> Reported-by: Eric Whitney <enwlinux@...il.com>
> Fixes: dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide")
> CC: stable@...r.kernel.org
> Signed-off-by: Jan Kara <jack@...e.cz>
> ---
>  fs/quota/dquot.c         | 59 ++++++++++++++++++++++------------------
>  include/linux/quota.h    |  4 ++-
>  include/linux/quotaops.h |  2 +-
>  3 files changed, 36 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index 9e72bfe8bbad..f4df2420e59c 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -233,19 +233,18 @@ static void put_quota_format(struct quota_format_type *fmt)
>   * All dquots are placed to the end of inuse_list when first created, and this
>   * list is used for invalidate operation, which must look at every dquot.
>   *
> - * When the last reference of a dquot will be dropped, the dquot will be
> - * added to releasing_dquots. We'd then queue work item which would call
> + * When the last reference of a dquot is dropped, the dquot is added to
> + * releasing_dquots. We'll then queue work item which will call
>   * synchronize_srcu() and after that perform the final cleanup of all the
> - * dquots on the list. Both releasing_dquots and free_dquots use the
> - * dq_free list_head in the dquot struct. When a dquot is removed from
> - * releasing_dquots, a reference count is always subtracted, and if
> - * dq_count == 0 at that point, the dquot will be added to the free_dquots.
> + * dquots on the list. Each cleaned up dquot is moved to free_dquots list.
> + * Both releasing_dquots and free_dquots use the dq_free list_head in the dquot
> + * struct.
>   *
> - * Unused dquots (dq_count == 0) are added to the free_dquots list when freed,
> - * and this list is searched whenever we need an available dquot.  Dquots are
> - * removed from the list as soon as they are used again, and
> - * dqstats.free_dquots gives the number of dquots on the list. When
> - * dquot is invalidated it's completely released from memory.
> + * Unused and cleaned up dquots are in the free_dquots list and this list is
> + * searched whenever we need an available dquot. Dquots are removed from the
> + * list as soon as they are used again and dqstats.free_dquots gives the number
> + * of dquots on the list. When dquot is invalidated it's completely released
> + * from memory.
>   *
>   * Dirty dquots are added to the dqi_dirty_list of quota_info when mark
>   * dirtied, and this list is searched when writing dirty dquots back to
> @@ -321,6 +320,7 @@ static inline void put_dquot_last(struct dquot *dquot)
>  static inline void put_releasing_dquots(struct dquot *dquot)
>  {
>  	list_add_tail(&dquot->dq_free, &releasing_dquots);
> +	set_bit(DQ_RELEASING_B, &dquot->dq_flags);
>  }
>  
>  static inline void remove_free_dquot(struct dquot *dquot)
> @@ -328,8 +328,10 @@ static inline void remove_free_dquot(struct dquot *dquot)
>  	if (list_empty(&dquot->dq_free))
>  		return;
>  	list_del_init(&dquot->dq_free);
> -	if (!atomic_read(&dquot->dq_count))
> +	if (!test_bit(DQ_RELEASING_B, &dquot->dq_flags))
>  		dqstats_dec(DQST_FREE_DQUOTS);
> +	else
> +		clear_bit(DQ_RELEASING_B, &dquot->dq_flags);
>  }
>  
>  static inline void put_inuse(struct dquot *dquot)
> @@ -581,12 +583,6 @@ static void invalidate_dquots(struct super_block *sb, int type)
>  			continue;
>  		/* Wait for dquot users */
>  		if (atomic_read(&dquot->dq_count)) {
> -			/* dquot in releasing_dquots, flush and retry */
> -			if (!list_empty(&dquot->dq_free)) {
> -				spin_unlock(&dq_list_lock);
> -				goto restart;
> -			}
> -
>  			atomic_inc(&dquot->dq_count);
>  			spin_unlock(&dq_list_lock);
>  			/*
> @@ -605,6 +601,15 @@ static void invalidate_dquots(struct super_block *sb, int type)
>  			 * restart. */
>  			goto restart;
>  		}
> +		/*
> +		 * The last user already dropped its reference but dquot didn't
> +		 * get fully cleaned up yet. Restart the scan which flushes the
> +		 * work cleaning up released dquots.
> +		 */
> +		if (test_bit(DQ_RELEASING_B, &dquot->dq_flags)) {
> +			spin_unlock(&dq_list_lock);
> +			goto restart;
> +		}
>  		/*
>  		 * Quota now has no users and it has been written on last
>  		 * dqput()
> @@ -809,18 +814,18 @@ static void quota_release_workfn(struct work_struct *work)
>  	/* Exchange the list head to avoid livelock. */
>  	list_replace_init(&releasing_dquots, &rls_head);
>  	spin_unlock(&dq_list_lock);
> +	synchronize_srcu(&dquot_srcu);
>  
>  restart:
> -	synchronize_srcu(&dquot_srcu);
>  	spin_lock(&dq_list_lock);
>  	while (!list_empty(&rls_head)) {
>  		dquot = list_first_entry(&rls_head, struct dquot, dq_free);
> -		/* Dquot got used again? */
> -		if (atomic_read(&dquot->dq_count) > 1) {
> -			remove_free_dquot(dquot);
> -			atomic_dec(&dquot->dq_count);
> -			continue;
> -		}
> +		WARN_ON_ONCE(atomic_read(&dquot->dq_count));
> +		/*
> +		 * Note that DQ_RELEASING_B protects us from racing with
> +		 * invalidate_dquots() calls so we are safe to work with the
> +		 * dquot even after we drop dq_list_lock.
> +		 */
>  		if (dquot_dirty(dquot)) {
>  			spin_unlock(&dq_list_lock);
>  			/* Commit dquot before releasing */
> @@ -834,7 +839,6 @@ static void quota_release_workfn(struct work_struct *work)
>  		}
>  		/* Dquot is inactive and clean, now move it to free list */
>  		remove_free_dquot(dquot);
> -		atomic_dec(&dquot->dq_count);
>  		put_dquot_last(dquot);
>  	}
>  	spin_unlock(&dq_list_lock);
> @@ -875,6 +879,7 @@ void dqput(struct dquot *dquot)
>  	BUG_ON(!list_empty(&dquot->dq_free));
>  #endif
>  	put_releasing_dquots(dquot);
> +	atomic_dec(&dquot->dq_count);
>  	spin_unlock(&dq_list_lock);
>  	queue_delayed_work(system_unbound_wq, &quota_release_work, 1);
>  }
> @@ -963,7 +968,7 @@ struct dquot *dqget(struct super_block *sb, struct kqid qid)
>  		dqstats_inc(DQST_LOOKUPS);
>  	}
>  	/* Wait for dq_lock - after this we know that either dquot_release() is
> -	 * already finished or it will be canceled due to dq_count > 1 test */
> +	 * already finished or it will be canceled due to dq_count > 0 test */
>  	wait_on_dquot(dquot);
>  	/* Read the dquot / allocate space in quota file */
>  	if (!dquot_active(dquot)) {
> diff --git a/include/linux/quota.h b/include/linux/quota.h
> index fd692b4a41d5..07071e64abf3 100644
> --- a/include/linux/quota.h
> +++ b/include/linux/quota.h
> @@ -285,7 +285,9 @@ static inline void dqstats_dec(unsigned int type)
>  #define DQ_FAKE_B	3	/* no limits only usage */
>  #define DQ_READ_B	4	/* dquot was read into memory */
>  #define DQ_ACTIVE_B	5	/* dquot is active (dquot_release not called) */
> -#define DQ_LASTSET_B	6	/* Following 6 bits (see QIF_) are reserved\
> +#define DQ_RELEASING_B	6	/* dquot is in releasing_dquots list waiting
> +				 * to be cleaned up */
> +#define DQ_LASTSET_B	7	/* Following 6 bits (see QIF_) are reserved\
>  				 * for the mask of entries set via SETQUOTA\
>  				 * quotactl. They are set under dq_data_lock\
>  				 * and the quota format handling dquot can\
> diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
> index 11a4becff3a9..4fa4ef0a173a 100644
> --- a/include/linux/quotaops.h
> +++ b/include/linux/quotaops.h
> @@ -57,7 +57,7 @@ static inline bool dquot_is_busy(struct dquot *dquot)
>  {
>  	if (test_bit(DQ_MOD_B, &dquot->dq_flags))
>  		return true;
> -	if (atomic_read(&dquot->dq_count) > 1)
> +	if (atomic_read(&dquot->dq_count) > 0)
>  		return true;
>  	return false;
>  }
> -- 
> 2.35.3
> 

Powered by blists - more mailing lists