linux-kernel - Re: [PATCH AUTOSEL 6.0 64/67] sbitmap: fix lockup while swapping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y0hQ300MiPc4GBvh@sashalap>
Date:   Thu, 13 Oct 2022 13:54:39 -0400
From:   Sasha Levin <sashal@...nel.org>
To:     Hugh Dickins <hughd@...gle.com>
Cc:     linux-kernel@...r.kernel.org, stable@...r.kernel.org,
        Jan Kara <jack@...e.cz>, Keith Busch <kbusch@...nel.org>,
        Jens Axboe <axboe@...nel.dk>, linux-block@...r.kernel.org
Subject: Re: [PATCH AUTOSEL 6.0 64/67] sbitmap: fix lockup while swapping

On Wed, Oct 12, 2022 at 06:08:50PM -0700, Hugh Dickins wrote:
>On Wed, 12 Oct 2022, Sasha Levin wrote:
>
>> From: Hugh Dickins <hughd@...gle.com>
>>
>> [ Upstream commit 30514bd2dd4e86a3ecfd6a93a3eadf7b9ea164a0 ]
>>
>> Commit 4acb83417cad ("sbitmap: fix batched wait_cnt accounting")
>> is a big improvement: without it, I had to revert to before commit
>> 040b83fcecfb ("sbitmap: fix possible io hung due to lost wakeup")
>> to avoid the high system time and freezes which that had introduced.
>>
>> Now okay on the NVME laptop, but 4acb83417cad is a disaster for heavy
>> swapping (kernel builds in low memory) on another: soon locking up in
>> sbitmap_queue_wake_up() (into which __sbq_wake_up() is inlined), cycling
>> around with waitqueue_active() but wait_cnt 0 .  Here is a backtrace,
>> showing the common pattern of outer sbitmap_queue_wake_up() interrupted
>> before setting wait_cnt 0 back to wake_batch (in some cases other CPUs
>> are idle, in other cases they're spinning for a lock in dd_bio_merge()):
>>
>> sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag <
>> __blk_mq_free_request < blk_mq_free_request < __blk_mq_end_request <
>> scsi_end_request < scsi_io_completion < scsi_finish_command <
>> scsi_complete < blk_complete_reqs < blk_done_softirq < __do_softirq <
>> __irq_exit_rcu < irq_exit_rcu < common_interrupt < asm_common_interrupt <
>> _raw_spin_unlock_irqrestore < __wake_up_common_lock < __wake_up <
>> sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag <
>> __blk_mq_free_request < blk_mq_free_request < dd_bio_merge <
>> blk_mq_sched_bio_merge < blk_mq_attempt_bio_merge < blk_mq_submit_bio <
>> __submit_bio < submit_bio_noacct_nocheck < submit_bio_noacct <
>> submit_bio < __swap_writepage < swap_writepage < pageout <
>> shrink_folio_list < evict_folios < lru_gen_shrink_lruvec <
>> shrink_lruvec < shrink_node < do_try_to_free_pages < try_to_free_pages <
>> __alloc_pages_slowpath < __alloc_pages < folio_alloc < vma_alloc_folio <
>> do_anonymous_page < __handle_mm_fault < handle_mm_fault <
>> do_user_addr_fault < exc_page_fault < asm_exc_page_fault
>>
>> See how the process-context sbitmap_queue_wake_up() has been interrupted,
>> after bringing wait_cnt down to 0 (and in this example, after doing its
>> wakeups), before advancing wake_index and refilling wake_cnt: an
>> interrupt-context sbitmap_queue_wake_up() of the same sbq gets stuck.
>>
>> I have almost no grasp of all the possible sbitmap races, and their
>> consequences: but __sbq_wake_up() can do nothing useful while wait_cnt 0,
>> so it is better if sbq_wake_ptr() skips on to the next ws in that case:
>> which fixes the lockup and shows no adverse consequence for me.
>>
>> The check for wait_cnt being 0 is obviously racy, and ultimately can lead
>> to lost wakeups: for example, when there is only a single waitqueue with
>> waiters.  However, lost wakeups are unlikely to matter in these cases,
>> and a proper fix requires redesign (and benchmarking) of the batched
>> wakeup code: so let's plug the hole with this bandaid for now.
>>
>> Signed-off-by: Hugh Dickins <hughd@...gle.com>
>> Reviewed-by: Jan Kara <jack@...e.cz>
>> Reviewed-by: Keith Busch <kbusch@...nel.org>
>> Link: https://lore.kernel.org/r/9c2038a7-cdc5-5ee-854c-fbc6168bf16@google.com
>> Signed-off-by: Jens Axboe <axboe@...nel.dk>
>> Signed-off-by: Sasha Levin <sashal@...nel.org>
>
>Whoa!  NAK to this 6.0 backport, and to the 5.19, 5.15, 5.10, 5.4
>AUTOSEL backports of the same commit.  I never experienced such a
>lockup on those releases.  Or have I missed announcements of stable
>backports of the whole series of 6.1-rc commits to which this one
>is a fix?  (I hope not.)

Happy to drop it.

>I'm happy for my NAK to be overruled by Jens or Jan or Keith,
>if they see virtue in this commit, beyond what I'm aware of:
>but as it stands, it looks like AUTOSEL out of control again -
>it found the word "fix", and found that the commit applies cleanly,
>so thinks it must be a good stable addition.  Not necessarily so!

I'm a bit confused: the subject of the patch is "fix lockup while
swapping" and the body describes a lockup and that this patch "fixes the
lockup and shows no adverse consequence". What am I missing?

-- 
Thanks,
Sasha