[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZR3DY733/uZ86rAG@debian-BULLSEYE-live-builder-AMD64>
Date: Wed, 4 Oct 2023 15:56:19 -0400
From: Eric Whitney <enwlinux@...il.com>
To: Jan Kara <jack@...e.cz>
Cc: Eric Whitney <enwlinux@...il.com>, linux-ext4@...r.kernel.org,
libaokun1@...wei.com, linux-fsdevel@...r.kernel.org
Subject: Re: probable quota bug introduced in 6.6-rc1
* Jan Kara <jack@...e.cz>:
> On Tue 03-10-23 20:11:11, Eric Whitney wrote:
> > When run on my test hardware, generic/270 triggers hung task timeouts when
> > run on a 6.6-rc1 (or -rc2, -rc3, -rc4) kernel with kvm-xfstests using the
> > nojournal test scenario. The test always passes, but about 60% of the time
> > the running time of the test increases by an order of magnitude or more and
> > one or more of the hung task timeout warnings included below can be found in
> > the log.
> >
> > This does not reproduce on 6.5. Bisection leads to this patch:
> >
> > dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should
> > provide")
>
> Thanks for report! Indeed I can reproduce this. Attached patch fixes the
> problem for me, I'll queue it up in my tree once it passes some more
> testing.
>
> Honza
> --
> Jan Kara <jack@...e.com>
> SUSE Labs, CR
Hi Jan:
Thanks very much for the quick work - I ran 100 trials of generic/270 on
the nojournal test scenario using a patched 6.6-rc1 kernel and didn't see any
hung task timeouts. The elapsed test runtimes are comparable to those I get
when running on 6.5 using the same test hardware. So, your patch works for me.
Eric
> From cc557f91af0a970e731e3dc945a431271e59ce8c Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@...e.cz>
> Date: Wed, 4 Oct 2023 15:32:01 +0200
> Subject: [PATCH] quota: Fix slow quotaoff
>
> Eric has reported that commit dabc8b207566 ("quota: fix dqput() to
> follow the guarantees dquot_srcu should provide") heavily increases
> runtime of generic/270 xfstest for ext4 in nojournal mode. The reason
> for this is that ext4 in nojournal mode leaves dquots dirty until the last
> dqput() and thus the cleanup done in quota_release_workfn() has to write
> them all. Due to the way quota_release_workfn() is written this results
> in synchronize_srcu() call for each dirty dquot which makes the dquot
> cleanup when turning quotas off extremely slow.
>
> To be able to avoid synchronize_srcu() for each dirty dquot we need to
> rework how we track dquots to be cleaned up. Instead of keeping the last
> dquot reference while it is on releasing_dquots list, we drop it right
> away and mark the dquot with new DQ_RELEASING_B bit instead. This way we
> can we can remove dquot from releasing_dquots list when new reference to
> it is acquired and thus there's no need to call synchronize_srcu() each
> time we drop dq_list_lock.
>
> References: https://lore.kernel.org/all/ZRytn6CxFK2oECUt@debian-BULLSEYE-live-builder-AMD64
> Reported-by: Eric Whitney <enwlinux@...il.com>
> Fixes: dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide")
> CC: stable@...r.kernel.org
> Signed-off-by: Jan Kara <jack@...e.cz>
> ---
> fs/quota/dquot.c | 59 ++++++++++++++++++++++------------------
> include/linux/quota.h | 4 ++-
> include/linux/quotaops.h | 2 +-
> 3 files changed, 36 insertions(+), 29 deletions(-)
>
> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index 9e72bfe8bbad..f4df2420e59c 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -233,19 +233,18 @@ static void put_quota_format(struct quota_format_type *fmt)
> * All dquots are placed to the end of inuse_list when first created, and this
> * list is used for invalidate operation, which must look at every dquot.
> *
> - * When the last reference of a dquot will be dropped, the dquot will be
> - * added to releasing_dquots. We'd then queue work item which would call
> + * When the last reference of a dquot is dropped, the dquot is added to
> + * releasing_dquots. We'll then queue work item which will call
> * synchronize_srcu() and after that perform the final cleanup of all the
> - * dquots on the list. Both releasing_dquots and free_dquots use the
> - * dq_free list_head in the dquot struct. When a dquot is removed from
> - * releasing_dquots, a reference count is always subtracted, and if
> - * dq_count == 0 at that point, the dquot will be added to the free_dquots.
> + * dquots on the list. Each cleaned up dquot is moved to free_dquots list.
> + * Both releasing_dquots and free_dquots use the dq_free list_head in the dquot
> + * struct.
> *
> - * Unused dquots (dq_count == 0) are added to the free_dquots list when freed,
> - * and this list is searched whenever we need an available dquot. Dquots are
> - * removed from the list as soon as they are used again, and
> - * dqstats.free_dquots gives the number of dquots on the list. When
> - * dquot is invalidated it's completely released from memory.
> + * Unused and cleaned up dquots are in the free_dquots list and this list is
> + * searched whenever we need an available dquot. Dquots are removed from the
> + * list as soon as they are used again and dqstats.free_dquots gives the number
> + * of dquots on the list. When dquot is invalidated it's completely released
> + * from memory.
> *
> * Dirty dquots are added to the dqi_dirty_list of quota_info when mark
> * dirtied, and this list is searched when writing dirty dquots back to
> @@ -321,6 +320,7 @@ static inline void put_dquot_last(struct dquot *dquot)
> static inline void put_releasing_dquots(struct dquot *dquot)
> {
> list_add_tail(&dquot->dq_free, &releasing_dquots);
> + set_bit(DQ_RELEASING_B, &dquot->dq_flags);
> }
>
> static inline void remove_free_dquot(struct dquot *dquot)
> @@ -328,8 +328,10 @@ static inline void remove_free_dquot(struct dquot *dquot)
> if (list_empty(&dquot->dq_free))
> return;
> list_del_init(&dquot->dq_free);
> - if (!atomic_read(&dquot->dq_count))
> + if (!test_bit(DQ_RELEASING_B, &dquot->dq_flags))
> dqstats_dec(DQST_FREE_DQUOTS);
> + else
> + clear_bit(DQ_RELEASING_B, &dquot->dq_flags);
> }
>
> static inline void put_inuse(struct dquot *dquot)
> @@ -581,12 +583,6 @@ static void invalidate_dquots(struct super_block *sb, int type)
> continue;
> /* Wait for dquot users */
> if (atomic_read(&dquot->dq_count)) {
> - /* dquot in releasing_dquots, flush and retry */
> - if (!list_empty(&dquot->dq_free)) {
> - spin_unlock(&dq_list_lock);
> - goto restart;
> - }
> -
> atomic_inc(&dquot->dq_count);
> spin_unlock(&dq_list_lock);
> /*
> @@ -605,6 +601,15 @@ static void invalidate_dquots(struct super_block *sb, int type)
> * restart. */
> goto restart;
> }
> + /*
> + * The last user already dropped its reference but dquot didn't
> + * get fully cleaned up yet. Restart the scan which flushes the
> + * work cleaning up released dquots.
> + */
> + if (test_bit(DQ_RELEASING_B, &dquot->dq_flags)) {
> + spin_unlock(&dq_list_lock);
> + goto restart;
> + }
> /*
> * Quota now has no users and it has been written on last
> * dqput()
> @@ -809,18 +814,18 @@ static void quota_release_workfn(struct work_struct *work)
> /* Exchange the list head to avoid livelock. */
> list_replace_init(&releasing_dquots, &rls_head);
> spin_unlock(&dq_list_lock);
> + synchronize_srcu(&dquot_srcu);
>
> restart:
> - synchronize_srcu(&dquot_srcu);
> spin_lock(&dq_list_lock);
> while (!list_empty(&rls_head)) {
> dquot = list_first_entry(&rls_head, struct dquot, dq_free);
> - /* Dquot got used again? */
> - if (atomic_read(&dquot->dq_count) > 1) {
> - remove_free_dquot(dquot);
> - atomic_dec(&dquot->dq_count);
> - continue;
> - }
> + WARN_ON_ONCE(atomic_read(&dquot->dq_count));
> + /*
> + * Note that DQ_RELEASING_B protects us from racing with
> + * invalidate_dquots() calls so we are safe to work with the
> + * dquot even after we drop dq_list_lock.
> + */
> if (dquot_dirty(dquot)) {
> spin_unlock(&dq_list_lock);
> /* Commit dquot before releasing */
> @@ -834,7 +839,6 @@ static void quota_release_workfn(struct work_struct *work)
> }
> /* Dquot is inactive and clean, now move it to free list */
> remove_free_dquot(dquot);
> - atomic_dec(&dquot->dq_count);
> put_dquot_last(dquot);
> }
> spin_unlock(&dq_list_lock);
> @@ -875,6 +879,7 @@ void dqput(struct dquot *dquot)
> BUG_ON(!list_empty(&dquot->dq_free));
> #endif
> put_releasing_dquots(dquot);
> + atomic_dec(&dquot->dq_count);
> spin_unlock(&dq_list_lock);
> queue_delayed_work(system_unbound_wq, "a_release_work, 1);
> }
> @@ -963,7 +968,7 @@ struct dquot *dqget(struct super_block *sb, struct kqid qid)
> dqstats_inc(DQST_LOOKUPS);
> }
> /* Wait for dq_lock - after this we know that either dquot_release() is
> - * already finished or it will be canceled due to dq_count > 1 test */
> + * already finished or it will be canceled due to dq_count > 0 test */
> wait_on_dquot(dquot);
> /* Read the dquot / allocate space in quota file */
> if (!dquot_active(dquot)) {
> diff --git a/include/linux/quota.h b/include/linux/quota.h
> index fd692b4a41d5..07071e64abf3 100644
> --- a/include/linux/quota.h
> +++ b/include/linux/quota.h
> @@ -285,7 +285,9 @@ static inline void dqstats_dec(unsigned int type)
> #define DQ_FAKE_B 3 /* no limits only usage */
> #define DQ_READ_B 4 /* dquot was read into memory */
> #define DQ_ACTIVE_B 5 /* dquot is active (dquot_release not called) */
> -#define DQ_LASTSET_B 6 /* Following 6 bits (see QIF_) are reserved\
> +#define DQ_RELEASING_B 6 /* dquot is in releasing_dquots list waiting
> + * to be cleaned up */
> +#define DQ_LASTSET_B 7 /* Following 6 bits (see QIF_) are reserved\
> * for the mask of entries set via SETQUOTA\
> * quotactl. They are set under dq_data_lock\
> * and the quota format handling dquot can\
> diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
> index 11a4becff3a9..4fa4ef0a173a 100644
> --- a/include/linux/quotaops.h
> +++ b/include/linux/quotaops.h
> @@ -57,7 +57,7 @@ static inline bool dquot_is_busy(struct dquot *dquot)
> {
> if (test_bit(DQ_MOD_B, &dquot->dq_flags))
> return true;
> - if (atomic_read(&dquot->dq_count) > 1)
> + if (atomic_read(&dquot->dq_count) > 0)
> return true;
> return false;
> }
> --
> 2.35.3
>
Powered by blists - more mailing lists