[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <454e24e1-9713-f267-6332-d95f1273f378@huaweicloud.com>
Date: Thu, 27 Jul 2023 10:38:12 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: David Howells <dhowells@...hat.com>, Theodore Ts'o <tytso@....edu>,
Andrew Morton <akpm@...ux-foundation.org>,
Song Liu <song@...nel.org>, Christoph Hellwig <hch@....de>,
Alasdair Kergon <agk@...hat.com>
Cc: linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, dm-devel@...hat.com,
linux-ext4@...r.kernel.org, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [dm-devel] Processes hung in "D" state in ext4, mm, md and
dmcrypt
Hi,
在 2023/07/26 18:02, David Howells 写道:
> Hi,
>
> With 6.5-rc2 (6.5.0-0.rc2.20230721gitf7e3a1bafdea.20.fc39.x86_64), I'm seeing
> a bunch of processes getting stuck in the D state on my desktop after a few
> hours of reading email and compiling stuff. It's happened every day this week
> so far and I managed to grab stack traces of the stuck processes this morning
> (see attached).
>
> There are two blockdevs involved below, /dev/md2 and /dev/md3. md3 is a raid1
> array with two partitions with an ext4 partition on it. md2 is similar but
> it's dm-crypted and ext4 is on top of that.
>
> David
> ---
>
> 1015 ? D 0:04 [md2_raid1]
> 1074 ? D 0:00 [jbd2/sda6-8]
> 1138 ? D 0:00 [jbd2/md3-8]
> 1167 ? D 0:10 [dmcrypt_write/253:0]
> 1202 ? D 0:03 [jbd2/dm-0-8]
> 117547 ? D 5:12 [kworker/u16:8+flush-9:3]
> 121540 ? D 0:46 [kworker/u16:10+flush-253:0]
> 125431 pts/2 Dl+ 0:00 emacs .stgit-edit.txt
> 125469 ? D 0:00 /usr/libexec/nmh/rcvstore +kernel
>
> ===1015===
> PID TTY STAT TIME COMMAND
> 1015 ? D 0:04 [md2_raid1]
> [<0>] md_super_wait+0xa2/0xe0
> [<0>] md_bitmap_daemon_work+0x183/0x3b0
> [<0>] md_check_recovery+0x42/0x5a0
> [<0>] raid1d+0x87/0x16f0 [raid1]
> [<0>] md_thread+0xab/0x190
> [<0>] kthread+0xe5/0x120
> [<0>] ret_from_fork+0x31/0x50
> [<0>] ret_from_fork_asm+0x1b/0x30
This means either the io to write super_block is stuck in underlying
disks or writing super_block is broken, I think it's probably the
former case. You'll need to locate where this io is now. If it can
be sure that there is no io in underlying disks, then this problem
is related to raid.
>
> ===1074===
> PID TTY STAT TIME COMMAND
> 1074 ? D 0:00 [jbd2/sda6-8]
> [<0>] jbd2_journal_commit_transaction+0x11a6/0x1a20
> [<0>] kjournald2+0xad/0x280
> [<0>] kthread+0xe5/0x120
> [<0>] ret_from_fork+0x31/0x50
> [<0>] ret_from_fork_asm+0x1b/0x30
>
> ===1138===
> PID TTY STAT TIME COMMAND
> 1138 ? D 0:00 [jbd2/md3-8]
> [<0>] jbd2_journal_commit_transaction+0x162d/0x1a20
> [<0>] kjournald2+0xad/0x280
> [<0>] kthread+0xe5/0x120
> [<0>] ret_from_fork+0x31/0x50
> [<0>] ret_from_fork_asm+0x1b/0x30
>
> ===1167===
> PID TTY STAT TIME COMMAND
> 1167 ? D 0:10 [dmcrypt_write/253:0]
> [<0>] md_super_wait+0xa2/0xe0
> [<0>] md_bitmap_unplug+0xad/0x120
> [<0>] flush_bio_list+0xf3/0x100 [raid1]
> [<0>] raid1_unplug+0x3b/0xb0 [raid1]
> [<0>] __blk_flush_plug+0xd8/0x160
> [<0>] blk_finish_plug+0x29/0x40
> [<0>] dmcrypt_write+0x132/0x140 [dm_crypt]
> [<0>] kthread+0xe5/0x120
> [<0>] ret_from_fork+0x31/0x50
> [<0>] ret_from_fork_asm+0x1b/0x30
>
> ===1202===
> PID TTY STAT TIME COMMAND
> 1202 ? D 0:03 [jbd2/dm-0-8]
> [<0>] jbd2_journal_commit_transaction+0x162d/0x1a20
> [<0>] kjournald2+0xad/0x280
> [<0>] kthread+0xe5/0x120
> [<0>] ret_from_fork+0x31/0x50
> [<0>] ret_from_fork_asm+0x1b/0x30
>
> ===117547===
> PID TTY STAT TIME COMMAND
> 117547 ? D 5:12 [kworker/u16:8+flush-9:3]
> [<0>] blk_mq_get_tag+0x11e/0x2b0
Is this one of raid underlying disks? If so, this looks like io is stuck
in underlying disks.
Thanks,
Kuai
> [<0>] __blk_mq_alloc_requests+0x1bc/0x350
> [<0>] blk_mq_submit_bio+0x2c7/0x680
> [<0>] __submit_bio+0x8b/0x170
> [<0>] submit_bio_noacct_nocheck+0x159/0x370
> [<0>] __block_write_full_folio+0x1e1/0x400
> [<0>] writepage_cb+0x1a/0x70
> [<0>] write_cache_pages+0x144/0x3b0
> [<0>] do_writepages+0x164/0x1e0
> [<0>] __writeback_single_inode+0x3d/0x360
> [<0>] writeback_sb_inodes+0x1ed/0x4b0
> [<0>] __writeback_inodes_wb+0x4c/0xf0
> [<0>] wb_writeback+0x298/0x310
> [<0>] wb_workfn+0x35b/0x510
> [<0>] process_one_work+0x1de/0x3f0
> [<0>] worker_thread+0x51/0x390
> [<0>] kthread+0xe5/0x120
> [<0>] ret_from_fork+0x31/0x50
> [<0>] ret_from_fork_asm+0x1b/0x30
>
> ===121540===
> PID TTY STAT TIME COMMAND
> 121540 ? D 0:46 [kworker/u16:10+flush-253:0]
> [<0>] folio_wait_bit_common+0x13d/0x350
> [<0>] mpage_prepare_extent_to_map+0x309/0x4d0
> [<0>] ext4_do_writepages+0x25d/0xc90
> [<0>] ext4_writepages+0xad/0x180
> [<0>] do_writepages+0xcf/0x1e0
> [<0>] __writeback_single_inode+0x3d/0x360
> [<0>] writeback_sb_inodes+0x1ed/0x4b0
> [<0>] __writeback_inodes_wb+0x4c/0xf0
> [<0>] wb_writeback+0x298/0x310
> [<0>] wb_workfn+0x35b/0x510
> [<0>] process_one_work+0x1de/0x3f0
> [<0>] worker_thread+0x51/0x390
> [<0>] kthread+0xe5/0x120
> [<0>] ret_from_fork+0x31/0x50
> [<0>] ret_from_fork_asm+0x1b/0x30
>
> ===125431===
> PID TTY STAT TIME COMMAND
> 125431 pts/2 Dl+ 0:00 emacs .stgit-edit.txt
> [<0>] jbd2_log_wait_commit+0xd8/0x140
> [<0>] ext4_sync_file+0x1cc/0x380
> [<0>] __x64_sys_fsync+0x3b/0x70
> [<0>] do_syscall_64+0x5d/0x90
> [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>
> ===125469===
> PID TTY STAT TIME COMMAND
> 125469 ? D 0:00 /usr/libexec/nmh/rcvstore +kernel
> [<0>] folio_wait_bit_common+0x13d/0x350
> [<0>] folio_wait_writeback+0x2c/0x90
> [<0>] truncate_inode_partial_folio+0x5e/0x1a0
> [<0>] truncate_inode_pages_range+0x1da/0x400
> [<0>] truncate_pagecache+0x47/0x60
> [<0>] ext4_setattr+0x685/0xba0
> [<0>] notify_change+0x1e0/0x4a0
> [<0>] do_truncate+0x98/0xf0
> [<0>] do_sys_ftruncate+0x15c/0x1b0
> [<0>] do_syscall_64+0x5d/0x90
> [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> --
> dm-devel mailing list
> dm-devel@...hat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel
>
> .
>
Powered by blists - more mailing lists