linux-ext4 - Re: [dm-devel] Processes hung in "D" state in ext4, mm, md and dmcrypt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20230726123046.a001b6963da19ca883045271@linux-foundation.org>
Date:   Wed, 26 Jul 2023 12:30:46 -0700
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     Ming Lei <tom.leiming@...il.com>
Cc:     David Howells <dhowells@...hat.com>,
        linux-block <linux-block@...r.kernel.org>,
        "Theodore Ts'o" <tytso@....edu>, Song Liu <song@...nel.org>,
        Christoph Hellwig <hch@....de>,
        Alasdair Kergon <agk@...hat.com>, linux-raid@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        dm-devel@...hat.com, linux-ext4@...r.kernel.org,
        Ming Lei <ming.lei@...hat.com>
Subject: Re: [dm-devel] Processes hung in "D" state in ext4, mm, md and
 dmcrypt

On Wed, 26 Jul 2023 23:29:51 +0800 Ming Lei <tom.leiming@...il.com> wrote:

> On Wed, Jul 26, 2023 at 6:02 PM David Howells <dhowells@...hat.com> wrote:
> >
> > Hi,
> >
> > With 6.5-rc2 (6.5.0-0.rc2.20230721gitf7e3a1bafdea.20.fc39.x86_64), I'm seeing
> > a bunch of processes getting stuck in the D state on my desktop after a few
> > hours of reading email and compiling stuff.  It's happened every day this week
> > so far and I managed to grab stack traces of the stuck processes this morning
> > (see attached).
> >
> > There are two blockdevs involved below, /dev/md2 and /dev/md3.  md3 is a raid1
> > array with two partitions with an ext4 partition on it.  md2 is similar but
> > it's dm-crypted and ext4 is on top of that.
> >
> ...
> 
> > ===117547===
> >     PID TTY      STAT   TIME COMMAND
> >  117547 ?        D      5:12 [kworker/u16:8+flush-9:3]
> > [<0>] blk_mq_get_tag+0x11e/0x2b0
> > [<0>] __blk_mq_alloc_requests+0x1bc/0x350
> > [<0>] blk_mq_submit_bio+0x2c7/0x680
> > [<0>] __submit_bio+0x8b/0x170
> > [<0>] submit_bio_noacct_nocheck+0x159/0x370
> > [<0>] __block_write_full_folio+0x1e1/0x400
> > [<0>] writepage_cb+0x1a/0x70
> > [<0>] write_cache_pages+0x144/0x3b0
> > [<0>] do_writepages+0x164/0x1e0
> > [<0>] __writeback_single_inode+0x3d/0x360
> > [<0>] writeback_sb_inodes+0x1ed/0x4b0
> > [<0>] __writeback_inodes_wb+0x4c/0xf0
> > [<0>] wb_writeback+0x298/0x310
> > [<0>] wb_workfn+0x35b/0x510
> > [<0>] process_one_work+0x1de/0x3f0
> > [<0>] worker_thread+0x51/0x390
> > [<0>] kthread+0xe5/0x120
> > [<0>] ret_from_fork+0x31/0x50
> > [<0>] ret_from_fork_asm+0x1b/0x30
> 
> BTW, -rc3 fixes one similar issue on the above code path, so please try -rc3.
> 
> 106397376c03 sbitmap: fix batching wakeup

That patch really needs a Fixes:, please.  And consideration for a
-stable backport.

Looking at what has changed recently in sbitmap, it seems unlikely that
106397376c03 fixes an issue that just appeared in 6.5-rcX.  But maybe
the issue you have identified has recently become easier to hit; we'll
see.