linux-kernel - Re: md_raid: mdX_raid6 looping after sync

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 17 Sep 2023 10:55:44 +0200
From:   Donald Buczek <buczek@...gen.mpg.de>
To:     Dragan Stancevic <dragan@...ncevic.com>,
        Yu Kuai <yukuai1@...weicloud.com>, song@...nel.org
Cc:     guoqing.jiang@...ux.dev, it+raid@...gen.mpg.de,
        linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org,
        msmith626@...il.com, "yangerkun@...wei.com" <yangerkun@...wei.com>
Subject: Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle"
 transition

On 9/14/23 08:03, Donald Buczek wrote:
> On 9/13/23 16:16, Dragan Stancevic wrote:
>> Hi Donald-
>> [...]
>> Here is a list of changes for 6.1:
>>
>> e5e9b9cb71a0 md: factor out a helper to wake up md_thread directly
>> f71209b1f21c md: enhance checking in md_check_recovery()
>> 753260ed0b46 md: wake up 'resync_wait' at last in md_reap_sync_thread()
>> 130443d60b1b md: refactor idle/frozen_sync_thread() to fix deadlock
>> 6f56f0c4f124 md: add a mutex to synchronize idle and frozen in action_store()
>> 64e5e09afc14 md: refactor action_store() for 'idle' and 'frozen'
>> a865b96c513b Revert "md: unlock mddev before reap sync_thread in action_store"
> 
> Thanks!
> 
> I've put these patches on v6.1.52. I've started a script which transitions the three md-devices of a very active backup server through idle->check->idle every 6 minutes a few ours ago.  It went through ~400 iterations till now. No lock-ups so far.

Oh dear, looks like the deadlock problem is _not_fixed with these patches.

We've had a lockup again after ~3 days of operation. Again, the `echo idle > $sys/md/sync_action` is hanging:

# # /proc/70554/task/70554: mdcheck.safe : /bin/bash /usr/bin/mdcheck.safe --continue --duration 06:00
# cat /proc/70554/task/70554/stack

[<0>] action_store+0x17f/0x390
[<0>] md_attr_store+0x83/0xf0
[<0>] kernfs_fop_write_iter+0x117/0x1b0
[<0>] vfs_write+0x2ce/0x400
[<0>] ksys_write+0x5f/0xe0
[<0>] do_syscall_64+0x43/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce

And everything else going to that specific raid (md0) is dead, too. No task is busy looping.

So as it looks now, we cant go from 5.15.X to 6.1.X as we would like to do. These patches don't fix the problem and our own patch no longer works with 6.1. Unfortunately, this happened on a production system which I need to reboot and is not available for further analysis. We'd need to reproduce the problem on a dedicated machine to really work on it.

Here's some more possibly interesting procfs output and some examples of tasks.

/sys/devices/virtual/block/md0/inflight : 0 3936

#/proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] [multipath]
md1 : active raid6 sdae[0] sdad[15] sdac[14] sdab[13] sdaa[12] sdz[11] sdy[10] sdx[9] sdw[8] sdv[7] sdu[6] sdt[5] sds[4] sdah[3] sdag[2] sdaf[1]
       109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
       bitmap: 0/59 pages [0KB], 65536KB chunk

md0 : active raid6 sdc[0] sdr[17] sdq[16] sdp[13] sdo[12] sdn[11] sdm[10] sdl[9] sdk[8] sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2] sdd[1]
       109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
       [===>.................]  check = 15.9% (1242830396/7813894144) finish=14788.4min speed=7405K/sec
       bitmap: 53/59 pages [212KB], 65536KB chunk

unused devices: <none>

# # /proc/66024/task/66024: md0_resync :
# cat /proc/66024/task/66024/stack

[<0>] raid5_get_active_stripe+0x20f/0x4d0
[<0>] raid5_sync_request+0x38b/0x3b0
[<0>] md_do_sync.cold+0x40c/0x985
[<0>] md_thread+0xb1/0x160
[<0>] kthread+0xe7/0x110
[<0>] ret_from_fork+0x22/0x30

# # /proc/939/task/939: md0_raid6 :
# cat /proc/939/task/939/stack

[<0>] md_thread+0x12d/0x160
[<0>] kthread+0xe7/0x110
[<0>] ret_from_fork+0x22/0x30

# # /proc/1228/task/1228: xfsaild/md0 :
# cat /proc/1228/task/1228/stack

[<0>] raid5_get_active_stripe+0x20f/0x4d0
[<0>] raid5_make_request+0x24c/0x1170
[<0>] md_handle_request+0x131/0x220
[<0>] __submit_bio+0x89/0x130
[<0>] submit_bio_noacct_nocheck+0x160/0x360
[<0>] _xfs_buf_ioapply+0x26c/0x420
[<0>] __xfs_buf_submit+0x64/0x1d0
[<0>] xfs_buf_delwri_submit_buffers+0xc5/0x1e0
[<0>] xfsaild+0x2a0/0x880
[<0>] kthread+0xe7/0x110
[<0>] ret_from_fork+0x22/0x30

# # /proc/49747/task/49747: kworker/24:2+xfs-inodegc/md0 :
# cat /proc/49747/task/49747/stack

[<0>] xfs_buf_lock+0x35/0xf0
[<0>] xfs_buf_find_lock+0x45/0xf0
[<0>] xfs_buf_get_map+0x17d/0xa60
[<0>] xfs_buf_read_map+0x52/0x280
[<0>] xfs_trans_read_buf_map+0x115/0x350
[<0>] xfs_btree_read_buf_block.constprop.0+0x9a/0xd0
[<0>] xfs_btree_lookup_get_block+0x97/0x170
[<0>] xfs_btree_lookup+0xc4/0x4a0
[<0>] xfs_difree_finobt+0x62/0x250
[<0>] xfs_difree+0x130/0x1c0
[<0>] xfs_ifree+0x86/0x510
[<0>] xfs_inactive_ifree.isra.0+0xa2/0x1c0
[<0>] xfs_inactive+0xf8/0x170
[<0>] xfs_inodegc_worker+0x90/0x140
[<0>] process_one_work+0x1c7/0x3c0
[<0>] worker_thread+0x4d/0x3c0
[<0>] kthread+0xe7/0x110
[<0>] ret_from_fork+0x22/0x30

# # /proc/49844/task/49844: kworker/30:3+xfs-sync/md0 :
# cat /proc/49844/task/49844/stack

[<0>] __flush_workqueue+0x10e/0x390
[<0>] xlog_cil_push_now.isra.0+0x25/0x90
[<0>] xlog_cil_force_seq+0x7c/0x240
[<0>] xfs_log_force+0x83/0x240
[<0>] xfs_log_worker+0x3b/0xd0
[<0>] process_one_work+0x1c7/0x3c0
[<0>] worker_thread+0x4d/0x3c0
[<0>] kthread+0xe7/0x110
[<0>] ret_from_fork+0x22/0x30


# # /proc/52646/task/52646: kworker/u263:2+xfs-cil/md0 :
# cat /proc/52646/task/52646/stack

[<0>] raid5_get_active_stripe+0x20f/0x4d0
[<0>] raid5_make_request+0x24c/0x1170
[<0>] md_handle_request+0x131/0x220
[<0>] __submit_bio+0x89/0x130
[<0>] submit_bio_noacct_nocheck+0x160/0x360
[<0>] xlog_state_release_iclog+0xf6/0x1d0
[<0>] xlog_write_get_more_iclog_space+0x79/0xf0
[<0>] xlog_write+0x334/0x3b0
[<0>] xlog_cil_push_work+0x501/0x740
[<0>] process_one_work+0x1c7/0x3c0
[<0>] worker_thread+0x4d/0x3c0
[<0>] kthread+0xe7/0x110
[<0>] ret_from_fork+0x22/0x30

# # /proc/52753/task/52753: rm : rm -rf /project/pbackup_gone/data/C8029/home_Cyang/home_Cyang:202306011248:C3019.BEING_DELETED
# cat /proc/52753/task/52753/stack

[<0>] xfs_buf_lock+0x35/0xf0
[<0>] xfs_buf_find_lock+0x45/0xf0
[<0>] xfs_buf_get_map+0x17d/0xa60
[<0>] xfs_buf_read_map+0x52/0x280
[<0>] xfs_trans_read_buf_map+0x115/0x350
[<0>] xfs_read_agi+0x98/0x140
[<0>] xfs_iunlink+0x63/0x1f0
[<0>] xfs_remove+0x280/0x3a0
[<0>] xfs_vn_unlink+0x53/0xa0
[<0>] vfs_rmdir.part.0+0x5e/0x1e0
[<0>] do_rmdir+0x15c/0x1c0
[<0>] __x64_sys_unlinkat+0x4b/0x60
[<0>] do_syscall_64+0x43/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce

Best
   Donald

-- 
Donald Buczek
buczek@...gen.mpg.de
Tel: +49 30 8413 1433