lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <454e24e1-9713-f267-6332-d95f1273f378@huaweicloud.com>
Date:   Thu, 27 Jul 2023 10:38:12 +0800
From:   Yu Kuai <yukuai1@...weicloud.com>
To:     David Howells <dhowells@...hat.com>, Theodore Ts'o <tytso@....edu>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Song Liu <song@...nel.org>, Christoph Hellwig <hch@....de>,
        Alasdair Kergon <agk@...hat.com>
Cc:     linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, dm-devel@...hat.com,
        linux-ext4@...r.kernel.org, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [dm-devel] Processes hung in "D" state in ext4, mm, md and
 dmcrypt

Hi,

在 2023/07/26 18:02, David Howells 写道:
> Hi,
> 
> With 6.5-rc2 (6.5.0-0.rc2.20230721gitf7e3a1bafdea.20.fc39.x86_64), I'm seeing
> a bunch of processes getting stuck in the D state on my desktop after a few
> hours of reading email and compiling stuff.  It's happened every day this week
> so far and I managed to grab stack traces of the stuck processes this morning
> (see attached).
> 
> There are two blockdevs involved below, /dev/md2 and /dev/md3.  md3 is a raid1
> array with two partitions with an ext4 partition on it.  md2 is similar but
> it's dm-crypted and ext4 is on top of that.
> 
> David
> ---
> 
>     1015 ?        D      0:04 [md2_raid1]
>     1074 ?        D      0:00 [jbd2/sda6-8]
>     1138 ?        D      0:00 [jbd2/md3-8]
>     1167 ?        D      0:10 [dmcrypt_write/253:0]
>     1202 ?        D      0:03 [jbd2/dm-0-8]
>   117547 ?        D      5:12 [kworker/u16:8+flush-9:3]
>   121540 ?        D      0:46 [kworker/u16:10+flush-253:0]
>   125431 pts/2    Dl+    0:00 emacs .stgit-edit.txt
>   125469 ?        D      0:00 /usr/libexec/nmh/rcvstore +kernel
> 
> ===1015===
>      PID TTY      STAT   TIME COMMAND
>     1015 ?        D      0:04 [md2_raid1]
> [<0>] md_super_wait+0xa2/0xe0
> [<0>] md_bitmap_daemon_work+0x183/0x3b0
> [<0>] md_check_recovery+0x42/0x5a0
> [<0>] raid1d+0x87/0x16f0 [raid1]
> [<0>] md_thread+0xab/0x190
> [<0>] kthread+0xe5/0x120
> [<0>] ret_from_fork+0x31/0x50
> [<0>] ret_from_fork_asm+0x1b/0x30

This means either the io to write super_block is stuck in underlying
disks or writing super_block is broken, I think it's probably the
former case. You'll need to locate where this io is now. If it can
be sure that there is no io in underlying disks, then this problem
is related to raid.
> 
> ===1074===
>      PID TTY      STAT   TIME COMMAND
>     1074 ?        D      0:00 [jbd2/sda6-8]
> [<0>] jbd2_journal_commit_transaction+0x11a6/0x1a20
> [<0>] kjournald2+0xad/0x280
> [<0>] kthread+0xe5/0x120
> [<0>] ret_from_fork+0x31/0x50
> [<0>] ret_from_fork_asm+0x1b/0x30
> 
> ===1138===
>      PID TTY      STAT   TIME COMMAND
>     1138 ?        D      0:00 [jbd2/md3-8]
> [<0>] jbd2_journal_commit_transaction+0x162d/0x1a20
> [<0>] kjournald2+0xad/0x280
> [<0>] kthread+0xe5/0x120
> [<0>] ret_from_fork+0x31/0x50
> [<0>] ret_from_fork_asm+0x1b/0x30
> 
> ===1167===
>      PID TTY      STAT   TIME COMMAND
>     1167 ?        D      0:10 [dmcrypt_write/253:0]
> [<0>] md_super_wait+0xa2/0xe0
> [<0>] md_bitmap_unplug+0xad/0x120
> [<0>] flush_bio_list+0xf3/0x100 [raid1]
> [<0>] raid1_unplug+0x3b/0xb0 [raid1]
> [<0>] __blk_flush_plug+0xd8/0x160
> [<0>] blk_finish_plug+0x29/0x40
> [<0>] dmcrypt_write+0x132/0x140 [dm_crypt]
> [<0>] kthread+0xe5/0x120
> [<0>] ret_from_fork+0x31/0x50
> [<0>] ret_from_fork_asm+0x1b/0x30
> 
> ===1202===
>      PID TTY      STAT   TIME COMMAND
>     1202 ?        D      0:03 [jbd2/dm-0-8]
> [<0>] jbd2_journal_commit_transaction+0x162d/0x1a20
> [<0>] kjournald2+0xad/0x280
> [<0>] kthread+0xe5/0x120
> [<0>] ret_from_fork+0x31/0x50
> [<0>] ret_from_fork_asm+0x1b/0x30
> 
> ===117547===
>      PID TTY      STAT   TIME COMMAND
>   117547 ?        D      5:12 [kworker/u16:8+flush-9:3]
> [<0>] blk_mq_get_tag+0x11e/0x2b0

Is this one of raid underlying disks? If so, this looks like io is stuck
in underlying disks.

Thanks,
Kuai

> [<0>] __blk_mq_alloc_requests+0x1bc/0x350
> [<0>] blk_mq_submit_bio+0x2c7/0x680
> [<0>] __submit_bio+0x8b/0x170
> [<0>] submit_bio_noacct_nocheck+0x159/0x370
> [<0>] __block_write_full_folio+0x1e1/0x400
> [<0>] writepage_cb+0x1a/0x70
> [<0>] write_cache_pages+0x144/0x3b0
> [<0>] do_writepages+0x164/0x1e0
> [<0>] __writeback_single_inode+0x3d/0x360
> [<0>] writeback_sb_inodes+0x1ed/0x4b0
> [<0>] __writeback_inodes_wb+0x4c/0xf0
> [<0>] wb_writeback+0x298/0x310
> [<0>] wb_workfn+0x35b/0x510
> [<0>] process_one_work+0x1de/0x3f0
> [<0>] worker_thread+0x51/0x390
> [<0>] kthread+0xe5/0x120
> [<0>] ret_from_fork+0x31/0x50
> [<0>] ret_from_fork_asm+0x1b/0x30
> 
> ===121540===
>      PID TTY      STAT   TIME COMMAND
>   121540 ?        D      0:46 [kworker/u16:10+flush-253:0]
> [<0>] folio_wait_bit_common+0x13d/0x350
> [<0>] mpage_prepare_extent_to_map+0x309/0x4d0
> [<0>] ext4_do_writepages+0x25d/0xc90
> [<0>] ext4_writepages+0xad/0x180
> [<0>] do_writepages+0xcf/0x1e0
> [<0>] __writeback_single_inode+0x3d/0x360
> [<0>] writeback_sb_inodes+0x1ed/0x4b0
> [<0>] __writeback_inodes_wb+0x4c/0xf0
> [<0>] wb_writeback+0x298/0x310
> [<0>] wb_workfn+0x35b/0x510
> [<0>] process_one_work+0x1de/0x3f0
> [<0>] worker_thread+0x51/0x390
> [<0>] kthread+0xe5/0x120
> [<0>] ret_from_fork+0x31/0x50
> [<0>] ret_from_fork_asm+0x1b/0x30
> 
> ===125431===
>      PID TTY      STAT   TIME COMMAND
>   125431 pts/2    Dl+    0:00 emacs .stgit-edit.txt
> [<0>] jbd2_log_wait_commit+0xd8/0x140
> [<0>] ext4_sync_file+0x1cc/0x380
> [<0>] __x64_sys_fsync+0x3b/0x70
> [<0>] do_syscall_64+0x5d/0x90
> [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> 
> ===125469===
>      PID TTY      STAT   TIME COMMAND
>   125469 ?        D      0:00 /usr/libexec/nmh/rcvstore +kernel
> [<0>] folio_wait_bit_common+0x13d/0x350
> [<0>] folio_wait_writeback+0x2c/0x90
> [<0>] truncate_inode_partial_folio+0x5e/0x1a0
> [<0>] truncate_inode_pages_range+0x1da/0x400
> [<0>] truncate_pagecache+0x47/0x60
> [<0>] ext4_setattr+0x685/0xba0
> [<0>] notify_change+0x1e0/0x4a0
> [<0>] do_truncate+0x98/0xf0
> [<0>] do_sys_ftruncate+0x15c/0x1b0
> [<0>] do_syscall_64+0x5d/0x90
> [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> --
> dm-devel mailing list
> dm-devel@...hat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel
> 
> .
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ