linux-kernel - Re: [PATCH next] sbitmap: fix lockup while swapping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220926114416.t7t65u66ze76aiz7@quack3>
Date:   Mon, 26 Sep 2022 13:44:16 +0200
From:   Jan Kara <jack@...e.cz>
To:     Hugh Dickins <hughd@...gle.com>
Cc:     Keith Busch <kbusch@...nel.org>, Jan Kara <jack@...e.cz>,
        Jens Axboe <axboe@...nel.dk>,
        Yu Kuai <yukuai1@...weicloud.com>,
        Liu Song <liusong@...ux.alibaba.com>,
        linux-block@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH next] sbitmap: fix lockup while swapping

On Fri 23-09-22 16:15:29, Hugh Dickins wrote:
> On Fri, 23 Sep 2022, Hugh Dickins wrote:
> > On Fri, 23 Sep 2022, Keith Busch wrote:
> > 
> > > Does the following fix the observation? Rational being that there's no reason
> > > to spin on the current wait state that is already under handling; let
> > > subsequent clearings proceed to the next inevitable wait state immediately.
> > 
> > It's running fine without lockup so far; but doesn't this change merely
> > narrow the window?  If this is interrupted in between atomic_try_cmpxchg()
> > setting wait_cnt to 0 and sbq_index_atomic_inc() advancing wake_index,
> > don't we run the same risk as before, of sbitmap_queue_wake_up() from
> > the interrupt handler getting stuck on that wait_cnt 0?
> 
> Yes, it ran successfully for 50 minutes, then an interrupt came in
> immediately after the cmpxchg, and it locked up just as before.
> 
> Easily dealt with by disabling interrupts, no doubt, but I assume it's a
> badge of honour not to disable interrupts here (except perhaps in waking).

I don't think any magic with sbq_index_atomic_inc() is going to reliably
fix this. After all the current waitqueue may be the only one that has active
waiters so sbq_wake_ptr() will always end up returning this waitqueue
regardless of the current value of sbq->wake_index.

Honestly, this whole code needs a serious redesign. I have some
simplifications in mind but it will take some thinking and benchmarking so
we need some fix for the interim. I was pondering for quite some time about
some band aid to the problem you've found but didn't find anything
satisfactory.

In the end I see two options: 

1) Take your patch (as wrong as it is ;). Yes, it can lead to lost wakeups
but we were living with those for a relatively long time so probably we can
live with them for some longer.

2) Revert Yu Kuai's original fix 040b83fcecfb8 ("sbitmap: fix possible io
hung due to lost wakeup") and my fixup 48c033314f37 ("sbitmap: Avoid leaving
waitqueue in invalid state in __sbq_wake_up()"). But then Keith would have
to redo his batched accounting patches on top.

> Some clever way to make the wait_cnt and wake_index adjustments atomic?
> 
> Or is this sbitmap_queue_wake_up() interrupting sbitmap_queue_wake_up()
> just supposed never to happen, the counts preventing it: but some
> misaccounting letting it happen by mistake?

No, I think that is in principle a situation that we have to accommodate.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR