[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220726115124.cbmcs3xgeqfwv7qw@quack3>
Date: Tue, 26 Jul 2022 13:51:24 +0200
From: Jan Kara <jack@...e.cz>
To: Yu Kuai <yukuai1@...weicloud.com>
Cc: jack@...e.cz, axboe@...nel.dk, osandov@...com,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
yukuai3@...wei.com, yi.zhang@...wei.com
Subject: Re: [PATCH RFC v4] sbitmap: fix possible io hung due to lost wakeup
On Sat 23-07-22 10:41:22, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@...wei.com>
>
> There are two problems can lead to lost wakeup:
>
> 1) invalid wakeup on the wrong waitqueue:
>
> For example, 2 * wake_batch tags are put, while only wake_batch threads
> are woken:
>
> __sbq_wake_up
> atomic_cmpxchg -> reset wait_cnt
> __sbq_wake_up -> decrease wait_cnt
> ...
> __sbq_wake_up -> wait_cnt is decreased to 0 again
> atomic_cmpxchg
> sbq_index_atomic_inc -> increase wake_index
> wake_up_nr -> wake up and waitqueue might be empty
> sbq_index_atomic_inc -> increase again, one waitqueue is skipped
> wake_up_nr -> invalid wake up because old wakequeue might be empty
>
> To fix the problem, increasing 'wake_index' before resetting 'wait_cnt'.
>
> 2) 'wait_cnt' can be decreased while waitqueue is empty
>
> As pointed out by Jan Kara, following race is possible:
>
> CPU1 CPU2
> __sbq_wake_up __sbq_wake_up
> sbq_wake_ptr() sbq_wake_ptr() -> the same
> wait_cnt = atomic_dec_return()
> /* decreased to 0 */
> sbq_index_atomic_inc()
> /* move to next waitqueue */
> atomic_set()
> /* reset wait_cnt */
> wake_up_nr()
> /* wake up on the old waitqueue */
> wait_cnt = atomic_dec_return()
> /*
> * decrease wait_cnt in the old
> * waitqueue, while it can be
> * empty.
> */
>
> Fix the problem by waking up before updating 'wake_index' and
> 'wait_cnt'.
>
> With this patch, noted that 'wait_cnt' is still decreased in the old
> empty waitqueue, however, the wakeup is redirected to a active waitqueue,
> and the extra decrement on the old empty waitqueue is not handled.
>
> Fixes: 88459642cba4 ("blk-mq: abstract tag allocation out into sbitmap library")
> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
The patch looks good to me now (just one typo fix below). Thanks for the
fix! Feel free to add:
Reviewed-by: Jan Kara <jack@...e.cz>
> + /*
> + * Pairs with the memory barrier in sbitmap_queue_resize() to
> + * ensure that we see the batch size update before the wait
> + * count is reset.
> + *
> + * Also pairs with the implicit barrier between becrementing wait_cnt
^^^ decrementing
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists