linux-kernel - Re: [PATCH md-6.10 5/9] md: replace sysfs api sync

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ef9e7de5-9ae5-faae-1c4c-9d6357fb6ae2@huaweicloud.com>
Date: Wed, 22 May 2024 10:46:03 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Xiao Ni <xni@...hat.com>, Oliver Sang <oliver.sang@...el.com>
Cc: Yu Kuai <yukuai1@...weicloud.com>, oe-lkp@...ts.linux.dev, lkp@...el.com,
 linux-raid@...r.kernel.org, agk@...hat.com, snitzer@...nel.org,
 mpatocka@...hat.com, song@...nel.org, dm-devel@...ts.linux.dev,
 linux-kernel@...r.kernel.org, yi.zhang@...wei.com, yangerkun@...wei.com,
 "yukuai (C)" <yukuai3@...wei.com>,
 Mariusz Tkaczyk <mariusz.tkaczyk@...ux.intel.com>
Subject: Re: [PATCH md-6.10 5/9] md: replace sysfs api sync_action with new
 helpers

Hi,

在 2024/05/21 11:21, Xiao Ni 写道:
> Hi Kuai
> 
> I've tested 07reshape5intr with the latest upstream kernel 15 times
> without failure. So it's better to have a try with 07reshape5intr with
> your patch set.

I just discussed with Xiao on slack, for conclusion here:

The test 07reshape5intr will add a new disk to array, then start
reshape:

mdadm /dev/md0 --add /dev/xxx
mdadm --grow /dev/md0 -n 3

However, the grow will fail:
mdadm: Failed to initiate reshape!

Root cause is that in kernel, action_store() will return -EBUSY
if MD_RECOVERY_RUNNING is set:

// mdadm add
add_bound_rdev
  set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);

// daemon thread
md_check_recovery
  set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
  // do nothing
		// mdadm grow
		action_store
		 if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
		  return -EBUSY
  clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery)

This is a long term problem, and we need new synchronization in kernel
to make sure the grow won't fail.

Thanks,
Kuai

> 
> Regards
> Xiao
> 
> 
> 
> 
> On Tue, May 21, 2024 at 11:02 AM Oliver Sang <oliver.sang@...el.com> wrote:
>>
>> hi, Yu Kuai,
>>
>> On Tue, May 21, 2024 at 10:20:54AM +0800, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2024/05/20 23:01, kernel test robot 写道:
>>>>
>>>>
>>>> Hello,
>>>>
>>>> kernel test robot noticed "mdadm-selftests.07reshape5intr.fail" on:
>>>>
>>>> commit: 18effaab5f57ef44763e537c782f905e06f6c4f5 ("[PATCH md-6.10 5/9] md: replace sysfs api sync_action with new helpers")
>>>> url: https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/md-rearrange-recovery_flage/20240509-093248
>>>> base: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git for-next
>>>> patch link: https://lore.kernel.org/all/20240509011900.2694291-6-yukuai1@huaweicloud.com/
>>>> patch subject: [PATCH md-6.10 5/9] md: replace sysfs api sync_action with new helpers
>>>>
>>>> in testcase: mdadm-selftests
>>>> version: mdadm-selftests-x86_64-5f41845-1_20240412
>>>> with following parameters:
>>>>
>>>>      disk: 1HDD
>>>>      test_prefix: 07reshape5intr
>>>>
>>>>
>>>>
>>>> compiler: gcc-13
>>>> test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4790T CPU @ 2.70GHz (Haswell) with 16G memory
>>>>
>>>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>>>>
>>>>
>>>>
>>>>
>>>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>>>> the same patch/commit), kindly add following tags
>>>> | Reported-by: kernel test robot <oliver.sang@...el.com>
>>>> | Closes: https://lore.kernel.org/oe-lkp/202405202204.4e3dc662-oliver.sang@intel.com
>>>>
>>>> 2024-05-14 21:36:26 mkdir -p /var/tmp
>>>> 2024-05-14 21:36:26 mke2fs -t ext3 -b 4096 -J size=4 -q /dev/sda1
>>>> 2024-05-14 21:36:57 mount -t ext3 /dev/sda1 /var/tmp
>>>> sed -e 's/{DEFAULT_METADATA}/1.2/g' \
>>>> -e 's,{MAP_PATH},/run/mdadm/map,g'  mdadm.8.in > mdadm.8
>>>> /usr/bin/install -D -m 644 mdadm.8 /usr/share/man/man8/mdadm.8
>>>> /usr/bin/install -D -m 644 mdmon.8 /usr/share/man/man8/mdmon.8
>>>> /usr/bin/install -D -m 644 md.4 /usr/share/man/man4/md.4
>>>> /usr/bin/install -D -m 644 mdadm.conf.5 /usr/share/man/man5/mdadm.conf.5
>>>> /usr/bin/install -D -m 644 udev-md-raid-creating.rules /lib/udev/rules.d/01-md-raid-creating.rules
>>>> /usr/bin/install -D -m 644 udev-md-raid-arrays.rules /lib/udev/rules.d/63-md-raid-arrays.rules
>>>> /usr/bin/install -D -m 644 udev-md-raid-assembly.rules /lib/udev/rules.d/64-md-raid-assembly.rules
>>>> /usr/bin/install -D -m 644 udev-md-clustered-confirm-device.rules /lib/udev/rules.d/69-md-clustered-confirm-device.rules
>>>> /usr/bin/install -D  -m 755 mdadm /sbin/mdadm
>>>> /usr/bin/install -D  -m 755 mdmon /sbin/mdmon
>>>> Testing on linux-6.9.0-rc2-00012-g18effaab5f57 kernel
>>>> /lkp/benchmarks/mdadm-selftests/tests/07reshape5intr... FAILED - see /var/tmp/07reshape5intr.log and /var/tmp/fail07reshape5intr.log for detail
>>> [root@...ora mdadm]# ./test --dev=loop --tests=07reshape5intr
>>> test: skipping tests for multipath, which is removed in upstream 6.8+
>>> kernels
>>> test: skipping tests for linear, which is removed in upstream 6.8+ kernels
>>> Testing on linux-6.9.0-rc2-00023-gf092583596a2 kernel
>>> /root/mdadm/tests/07reshape5intr... FAILED - see /var/tmp/07reshape5intr.log
>>> and /var/tmp/fail07reshape5intr.log for details
>>>    (KNOWN BROKEN TEST: always fails)
>>>
>>> So, since this test is marked BROKEN.
>>>
>>> Please share the whole log, and is it possible to share the two logs?
>>
>>
>> we only captured one log as attached log-18effaab5f.
>> also attached parent log FYI.
>>
>>
>>>
>>> Thanks,
>>> Kuai
>>>
>>>>
>>>>
>>>>
>>>> The kernel config and materials to reproduce are available at:
>>>> https://download.01.org/0day-ci/archive/20240520/202405202204.4e3dc662-oliver.sang@intel.com
>>>>
>>>>
>>>>
>>>
> 
> 
> .
>