lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1f879e67-4d64-4df0-5817-360d84ff8b89@huaweicloud.com>
Date: Fri, 12 Jul 2024 16:35:50 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Konstantin Kharlamov <Hi-Angel@...dex.ru>, Song Liu <song@...nel.org>,
 linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
 "yukuai (C)" <yukuai3@...wei.com>,
 "yangerkun@...wei.com" <yangerkun@...wei.com>
Subject: Re: Lockup of (raid5 or raid6) + vdo after taking out a disk under
 load

Hi,

在 2024/07/12 14:53, Konstantin Kharlamov 写道:
> (this is a dup of yesterday's bugzilla bug 219030 as some maintainers
> prefer bugreporting over bugzilla some over ML, and the cases are
> indistinguishable)

At least I'll keep an eye on ML, not bugzilla.
> 
> We're seeing this on both LTS `6.1.97` as well the newest stable
> `6.9.8` (note though that VDO was only merged in 6.9, so for upstream
> it's probably the only one that matters. Traces below are from 6.9.8).
> 
> Creating a raid5 (alternatively raid6) + VDO, then starting a load and
> after a minute physically taking out a disk of the VG results in a
> lockup stack getting spammed over and over.
> 
> Seems unreproducible by removing a disk through software means of
> `/sys/block/sdX/device/delete`, i.e. it seem to only happen when either
> removing the disk physically or by disabling power on the JBOD slot
> with a command like `sg_ses --dev-slot-num=9 --set=3:4:1 /dev/sg26`.
> 
> Also when lockup happens, `reboot -ff` no longer does anything.
> 
> When looking at repeating stacktraces, the execution seems to loop
> around `raid5_get_active_stripe`, `raid5_compute_sector`,
> `raid5_release_stripe`.
> 
> # Steps to reproduce
> 
> 1. Create raid5 LV + VDO by executing a `./mk_lvm_raid5.sh /dev/sdX1
> /dev/sdY1 /dev/sdZ1` where `mk_lvm_raid5.sh` has the following content:
> 
>      #!/bin/bash
> 
>      set -exu
> 
>      if [ "$#" -ne 3 ]; then
>          echo "Wrong number of parameters.
>      Usage: $(basename $0) disk1 disk2 disk3"
>          exit 1
>      fi
> 
>      # create the VG
>      pvcreate -f "$1" "$2" "$3"
>      vgcreate p_r5 "$1" "$2" "$3"
> 
>      # create the LV
>      lvcreate --type raid5 -i 2 -L 21474836480b -I 64K -n
> vdo_internal_deco_vol p_r5 -y
>      lvconvert -y --type vdo-pool --virtualsize 107374182400B -n
> deco_vol p_r5/vdo_internal_deco_vol
> 
> 2. Start load by executing `fio ./fio-30%write.fio` with the `fio-
> 30%write.fio` having the following content:
> 
>      [test IOPS]
>      blocksize=8k
>      filename=/dev/p_r5/deco_vol
>      filesize=100G
>      direct=1
>      buffered=0
>      ioengine=libaio
>      iodepth=32
>      rw=randrw
>      rwmixwrite=30
>      numjobs=4
>      group_reporting
>      time_based
>      runtime=99h
>      clat_percentiles=0
>      unlink=1
> 
> 3. Wait for about a minute
> 4. Remove a disk of the volume group, either physically, or by turning
> off jbod slot's power (DO NOT use /…/device/delete).

Looks like this is because IO is failed from raid level, and then dm
level keep retry this IO(This will be related to the step 4), hence
raid5d stuck in the loop to hanlde new IO.

Can you give the following patch a test to confirm this?

Thanks!
Kuai

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c14cf2410365..a0f784cd664c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6776,7 +6776,9 @@ static void raid5d(struct md_thread *thread)

                 while ((bio = remove_bio_from_retry(conf, &offset))) {
                         int ok;
+
                         spin_unlock_irq(&conf->device_lock);
+                       cond_resched();
                         ok = retry_aligned_read(conf, bio, offset);
                         spin_lock_irq(&conf->device_lock);
                         if (!ok)
@@ -6790,11 +6792,11 @@ static void raid5d(struct md_thread *thread)
                         break;
                 handled += batch_size;

-               if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING)) {
-                       spin_unlock_irq(&conf->device_lock);
+               spin_unlock_irq(&conf->device_lock);
+               if (mddev->sb_flags & ~(1 << MD_SB_CHANGE_PENDING))
                         md_check_recovery(mddev);
-                       spin_lock_irq(&conf->device_lock);
-               }
+               cond_resched();
+               spin_lock_irq(&conf->device_lock);
         }
         pr_debug("%d stripes handled\n", handled);


> 5. Wait 30 seconds
> 
> ## Expected
> 
> `dmesg -w` won't show anything interesting
> 
> ## Actual
> 
> `dmesg -w` starts showing up traces as follows (not sure why the
> missing line numbers, I have debug symbols locally, I can decode them
> manually with `gdb` if needed):
> 
>      […]
>      [  869.086048] mpt3sas_cm0: log_info(0x31110e05): originator(PL),
> code(0x11), sub_code(0x0e05)
>      [  869.139959] mpt3sas_cm0: mpt3sas_transport_port_remove: removed:
> sas_addr(0x5002538a67b01303)
>      [  869.140085] mpt3sas_cm0: removing handle(0x0022),
> sas_addr(0x5002538a67b01303)
>      [  869.140191] mpt3sas_cm0: enclosure logical
> id(0x50015b2140128f7f), slot(12)
>      [  869.140293] mpt3sas_cm0: enclosure level(0x0000), connector
> name(     )
>      [  893.262506] watchdog: BUG: soft lockup - CPU#5 stuck for 26s!
> [mdX_raid5:24608]
>      [  893.262641] Modules linked in: dm_raid mptctl mptbase bonding
> fcoe libfcoe 8021q garp mrp libfc stp llc scsi_transport_fc
> sch_fq_codel softdog dm_round_robin scsi_dh_alua intel_rapl_msr
> coretemp intel_rapl_common sb_edac x86_pkg_temp_thermal
> intel_powerclamp ast kvm_intel drm_shmem_helper drm_kms_helper kvm
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3
> sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd rapl
> intel_cstate wdat_wdt intel_pch_thermal ipmi_ssif dm_multipath
> acpi_ipmi sg ipmi_si ipmi_devintf ipmi_msghandler acpi_pad
> acpi_power_meter ramoops reed_solomon efi_pstore drm ip_tables x_tables
> autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq
> async_xor xor async_tx raid6_pq libcrc32c raid0 ses enclosure mlx5_ib
> ib_core raid1 ixgbe igb mlx5_core mdio_devres xfrm_algo i2c_algo_bit
> e1000e mpt3sas hwmon dca mdio i2c_i801 ahci lpc_ich ptp i2c_smbus
> mfd_core raid_class libahci pps_core libphy scsi_transport_sas
>      [  893.264033] CPU: 5 PID: 24608 Comm: mdX_raid5 Not tainted 6.9.8-
> bstrg #3
>      [  893.264254] Hardware name: AIC HA401-LB2/LIBRA, BIOS LIBBV071
> 04/19/2017
>      [  893.264479] RIP: 0010:_raw_spin_lock_irqsave+0xe/0x30
>      [  893.264714] Code: 48 89 ef 5d e9 13 33 47 ff 0f 1f 00 90 90 90
> 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 48 89 fd
> 9c 41 5c <fa> bf 01 00 00 00 e8 07 96 43 ff 48 89 ef e8 df 32 47 ff 4c
> 89 e0
>      [  893.265229] RSP: 0018:ffffbc306091fd70 EFLAGS: 00000246
>      [  893.265502] RAX: 0000000000000001 RBX: 0000000000000001 RCX:
> 0000000000000000
>      [  893.265783] RDX: 0000000000000001 RSI: 0000000000000003 RDI:
> ffff9fbacf48c430
>      [  893.266069] RBP: ffff9fbacf48c430 R08: ffff9fcac1254d58 R09:
> 00000000ffffffff
>      [  893.266366] R10: 0000000000000000 R11: ffffbc306091fe20 R12:
> 0000000000000246
>      [  893.266664] R13: ffff9fcac1254dc0 R14: 0000000000000000 R15:
> 0000000000739d18
>      [  893.266968] FS:  0000000000000000(0000)
> GS:ffff9fc9fff40000(0000) knlGS:0000000000000000
>      [  893.267282] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>      [  893.267603] CR2: 000055f91452ff88 CR3: 0000000124c74004 CR4:
> 00000000003706f0
>      [  893.267933] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
>      [  893.268268] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
>      [  893.268605] Call Trace:
>      [  893.268945]  <IRQ>
>      [  893.269285]  ? watchdog_timer_fn+0x24b/0x2d0
>      [  893.269638]  ? __pfx_watchdog_timer_fn+0x10/0x10
>      [  893.269990]  ? __hrtimer_run_queues+0x112/0x2b0
>      [  893.270347]  ? hrtimer_interrupt+0x101/0x240
>      [  893.270708]  ? __sysvec_apic_timer_interrupt+0x6e/0x180
>      [  893.271075]  ? sysvec_apic_timer_interrupt+0x9d/0xd0
>      [  893.271448]  </IRQ>
>      [  893.271816]  <TASK>
>      [  893.272184]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
>      [  893.272568]  ? _raw_spin_lock_irqsave+0xe/0x30
>      [  893.272956]  __wake_up+0x1d/0x60
>      [  893.273345]  md_wakeup_thread+0x38/0x70
>      [  893.273738]  raid5d+0x32a/0x5d0 [raid456]
>      [  893.274147]  md_thread+0xc1/0x170
>      [  893.274545]  ? __pfx_autoremove_wake_function+0x10/0x10
>      [  893.274951]  ? __pfx_md_thread+0x10/0x10
>      [  893.275358]  kthread+0xff/0x130
>      [  893.275760]  ? __pfx_kthread+0x10/0x10
>      [  893.276172]  ret_from_fork+0x30/0x50
>      [  893.276568]  ? __pfx_kthread+0x10/0x10
>      [  893.276961]  ret_from_fork_asm+0x1a/0x30
>      [  893.277351]  </TASK>
> 
> .
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ