[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <391b1763-7146-857-e3b6-dc2a8e797162@google.com>
Date: Fri, 23 Sep 2022 14:29:37 -0700 (PDT)
From: Hugh Dickins <hughd@...gle.com>
To: Keith Busch <kbusch@...nel.org>
cc: Hugh Dickins <hughd@...gle.com>, Jan Kara <jack@...e.cz>,
Jens Axboe <axboe@...nel.dk>,
Yu Kuai <yukuai1@...weicloud.com>,
Liu Song <liusong@...ux.alibaba.com>,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH next] sbitmap: fix lockup while swapping
On Fri, 23 Sep 2022, Keith Busch wrote:
> Does the following fix the observation? Rational being that there's no reason
> to spin on the current wait state that is already under handling; let
> subsequent clearings proceed to the next inevitable wait state immediately.
It's running fine without lockup so far; but doesn't this change merely
narrow the window? If this is interrupted in between atomic_try_cmpxchg()
setting wait_cnt to 0 and sbq_index_atomic_inc() advancing wake_index,
don't we run the same risk as before, of sbitmap_queue_wake_up() from
the interrupt handler getting stuck on that wait_cnt 0?
>
> ---
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index 624fa7f118d1..47bf7882210b 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -634,6 +634,13 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq, int *nr)
>
> *nr -= sub;
>
> + /*
> + * Increase wake_index before updating wait_cnt, otherwise concurrent
> + * callers can see valid wait_cnt in old waitqueue, which can cause
> + * invalid wakeup on the old waitqueue.
> + */
> + sbq_index_atomic_inc(&sbq->wake_index);
> +
> /*
> * When wait_cnt == 0, we have to be particularly careful as we are
> * responsible to reset wait_cnt regardless whether we've actually
> @@ -660,13 +667,6 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq, int *nr)
> * of atomic_set().
> */
> smp_mb__before_atomic();
> -
> - /*
> - * Increase wake_index before updating wait_cnt, otherwise concurrent
> - * callers can see valid wait_cnt in old waitqueue, which can cause
> - * invalid wakeup on the old waitqueue.
> - */
> - sbq_index_atomic_inc(&sbq->wake_index);
> atomic_set(&ws->wait_cnt, wake_batch);
>
> return ret || *nr;
> --
Powered by blists - more mailing lists