linux-kernel - Re: [PATCH v5 00/14] dm-raid/md/raid: fix v6.7 regressions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <00390b5b-44a1-72d1-794d-3e1a7b3ffe05@huaweicloud.com>
Date: Sun, 4 Feb 2024 15:00:35 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Yu Kuai <yukuai1@...weicloud.com>,
 Benjamin Marzinski <bmarzins@...hat.com>
Cc: mpatocka@...hat.com, heinzm@...hat.com, xni@...hat.com,
 blazej.kucman@...ux.intel.com, agk@...hat.com, snitzer@...nel.org,
 dm-devel@...ts.linux.dev, song@...nel.org, jbrassow@....redhat.com,
 neilb@...e.de, shli@...com, akpm@...l.org, linux-kernel@...r.kernel.org,
 linux-raid@...r.kernel.org, yi.zhang@...wei.com, yangerkun@...wei.com,
 "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH v5 00/14] dm-raid/md/raid: fix v6.7 regressions

Hi,

在 2024/02/04 9:35, Yu Kuai 写道:
> Hi,
> 
> 在 2024/02/03 11:19, Benjamin Marzinski 写道:
>> On Thu, Feb 01, 2024 at 05:25:45PM +0800, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@...wei.com>
>>> I apply this patchset on top of v6.8-rc1, and run lvm2 tests suite with
>>> folling cmd for 24 round(for about 2 days):
>>>
>>> for t in `ls test/shell`; do
>>>          if cat test/shell/$t | grep raid &> /dev/null; then
>>>                  make check T=shell/$t
>>>          fi
>>> done
>>>
>>> failed count                             failed test
>>>        1 ###       failed: [ndev-vanilla] shell/dmsecuretest.sh
>>>        1 ###       failed: [ndev-vanilla] 
>>> shell/dmsetup-integrity-keys.sh
>>>        1 ###       failed: [ndev-vanilla] shell/dmsetup-keyring.sh
>>>        5 ###       failed: [ndev-vanilla] shell/duplicate-pvs-md0.sh
>>>        1 ###       failed: [ndev-vanilla] shell/duplicate-vgid.sh
>>>        2 ###       failed: [ndev-vanilla] shell/duplicate-vgnames.sh
>>>        1 ###       failed: [ndev-vanilla] shell/fsadm-crypt.sh
>>>        1 ###       failed: [ndev-vanilla] shell/integrity.sh
>>>        6 ###       failed: [ndev-vanilla] 
>>> shell/lvchange-raid1-writemostly.sh
>>>        2 ###       failed: [ndev-vanilla] shell/lvchange-rebuild-raid.sh
>>>        5 ###       failed: [ndev-vanilla] 
>>> shell/lvconvert-raid-reshape-stripes-load-reload.sh
>>>        4 ###       failed: [ndev-vanilla] 
>>> shell/lvconvert-raid-restripe-linear.sh
>>>        1 ###       failed: [ndev-vanilla] 
>>> shell/lvconvert-raid1-split-trackchanges.sh
>>>       20 ###       failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
>>>       20 ###       failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
>>>       24 ###       failed: [ndev-vanilla] shell/lvextend-raid.sh
>>>
>>> And I ramdomly pick some tests verified by hand that these test will
>>> fail in v6.6 as well(not all tests):
>>>
>>> shell/lvextend-raid.sh
>>> shell/lvcreate-large-raid.sh
>>> shell/lvconvert-repair-raid.sh
>>> shell/lvchange-rebuild-raid.sh
>>> shell/lvchange-raid1-writemostly.sh
>>
>> In my testing with this patchset on top of the head of linus's tree
>> (5c24e4e9e708) I am seeing failures in
>> shell/lvconvert-raid-reshape-stripes-load-reload.sh and
>> shell/lvconvert-repair-raid.sh in about 20% of my runs. I have never
>> seen either of these these fail running on the 6.6 kernel (ffc253263a13).
> 
> This sounds quite different in my testing, as I said, the test
> 
> shell/lvconvert-repair-raid.sh is very likely to fail in v6.6 already,
> I don't know why it never fail in your testing, test log in v6.6:
> 
> | [ 1:38.162] #lvconvert-repair-raid.sh:1+ aux teardown
> | [ 1:38.162] ## teardown.......## removing stray mapped devices with 
> names beginning with LVMTEST3474:
> | [ 1:39.207] .set +vx; STACKTRACE; set -vx
> | [ 1:41.448] ##lvconvert-repair-raid.sh:1+ set +vx
> | [ 1:41.448] ## - /mnt/test/lvm2/test/shell/lvconvert-repair-raid.sh:1
> | [ 1:41.449] ## 1 STACKTRACE() called from 
> /mnt/test/lvm2/test/shell/lvconvert-repair-raid.sh:1
> | [ 1:41.449] ## ERROR: The test started dmeventd (3718) unexpectedly.
> 
> And the same in v6.8-rc1. Perhaps do you know how to fix this error?
> 
> Thanks,
> Kuai
> 
>>
>> lvconvert-repair-raid.sh creates a raid array and then disables one if
>> its drives before there's enough time to finish the initial sync and
>> tries to repair it. This is supposed to fail (it uses dm-delay devices
>> to slow down the sync). When the test succeeds, I see things like this:
>>
>> [ 0:13.469] #lvconvert-repair-raid.sh:161+ lvcreate --type raid10 -m 1 
>> -i 2 -L 64 -n LV1 LVMTEST191946vg 
>> /tmp/LVMTEST191946.ImUMG6dyqB/dev/mapper/LVMTEST191946pv1 
>> /tmp/LVMTEST191946.ImUMG6dyqB/dev/mapper/LVMTEST191946pv2 
>> /tmp/LVMTEST191946.ImUMG6dyqB/dev/mapper/LVMTEST191946pv3 
>> /tmp/LVMTEST191946.ImUMG6dyqB/dev/mapper/LVMTEST191946pv4
>> [ 0:13.469]   Using default stripesize 64.00 KiB.
>> [ 0:13.483]   Logical volume "LV1" created.
>> [ 0:14.042] 6,8908,1194343108,-;device-mapper: raid: Superblocks 
>> created for new raid set
>> [ 0:14.042] 5,8909,1194348704,-;md/raid10:mdX: not clean -- starting 
>> background reconstruction
>> [ 0:14.042] 6,8910,1194349443,-;md/raid10:mdX: active with 4 out of 4 
>> devices
>> [ 0:14.042] 4,8911,1194459161,-;mdX: bitmap file is out of date, doing 
>> full recovery
>> [ 0:14.042] 6,8912,1194563810,-;md: resync of RAID array mdX
>> [ 0:14.042]   WARNING: This metadata update is NOT backed up.
>> [ 0:14.042] aux disable_dev "$dev4"
>> [ 0:14.058] #lvconvert-repair-raid.sh:163+ aux disable_dev 
>> /tmp/LVMTEST191946.ImUMG6dyqB/dev/mapper/LVMTEST191946pv4
>> [ 0:14.058] Disabling device 
>> /tmp/LVMTEST191946.ImUMG6dyqB/dev/mapper/LVMTEST191946pv4 (253:5)
>> [ 0:14.101] not lvconvert -y --repair $vg/$lv1
>>
>> When it fails, I see:
>>
>> [ 0:13.831] #lvconvert-repair-raid.sh:161+ lvcreate --type raid10 -m 1 
>> -i 2 -L 64 -n LV1 LVMTEST192248vg 
>> /tmp/LVMTEST192248.ATcecgSGfE/dev/mapper/LVMTEST192248pv1 
>> /tmp/LVMTEST192248.ATcecgSGfE/dev/mapper/LVMTEST192248pv2 
>> /tmp/LVMTEST192248.ATcecgSGfE/dev/mapper/LVMTEST192248pv3 
>> /tmp/LVMTEST192248.ATcecgSGfE/dev/mapper/LVMTEST192248pv4
>> [ 0:13.831]   Using default stripesize 64.00 KiB.
>> [ 0:13.847]   Logical volume "LV1" created.
>> [ 0:14.499]   WARNING: This metadata update is NOT backed up.
>> [ 0:14.499] 6,8925,1187444256,-;device-mapper: raid: Superblocks 
>> created for new raid set
>> [ 0:14.499] 5,8926,1187449525,-;md/raid10:mdX: not clean -- starting 
>> background reconstruction
>> [ 0:14.499] 6,8927,1187450148,-;md/raid10:mdX: active with 4 out of 4 
>> devices
>> [ 0:14.499] 6,8928,1187452472,-;md: resync of RAID array mdX
>> [ 0:14.499] 6,8929,1187453016,-;md: mdX: resync done.
>> [ 0:14.499] 4,8930,1187555486,-;mdX: bitmap file is out of date, doing 
>> full recovery
>> [ 0:14.499] aux disable_dev "$dev4"
>> [ 0:14.515] #lvconvert-repair-raid.sh:163+ aux disable_dev 
>> /tmp/LVMTEST192248.AT
>> cecgSGfE/dev/mapper/LVMTEST192248pv4
>> [ 0:14.515] Disabling device 
>> /tmp/LVMTEST192248.ATcecgSGfE/dev/mapper/LVMTEST192
>> 248pv4 (253:5)
>> [ 0:14.554] not lvconvert -y --repair $vg/$lv1
>>
>> To me the important looking difference (and I admit, I'm no RAID 
>> expert), is that in the
>> case where the test passes (where lvconvert fails as expected), I see
>>
>> [ 0:14.042] 4,8911,1194459161,-;mdX: bitmap file is out of date, doing 
>> full recovery
>> [ 0:14.042] 6,8912,1194563810,-;md: resync of RAID array mdX
>>
>> When it fails I see:
>>
>> [ 0:14.499] 6,8928,1187452472,-;md: resync of RAID array mdX
>> [ 0:14.499] 6,8929,1187453016,-;md: mdX: resync done.
>> [ 0:14.499] 4,8930,1187555486,-;mdX: bitmap file is out of date, doing 
>> full recovery
>>
>> Which appears to show a resync that takes no time, presumable because 
>> it happens before
>> the device notices that the bitmaps are wrong and schedules a full 
>> recovery.
>>
>>
>> lvconvert-raid-reshape-stripes-load-reload.sh repeatedly reloads the
>> device table during a raid reshape, and then tests the filesystem for
>> corruption afterwards. With this patchset, the filesystem is
>> occasionally corrupted.  I do not see this with the 6.6 kernel.

I tried this and I see lots of following in dmesg:

[ 1341.409813] dm-raid456: io failed across reshape position while 
reshape can't make progress

Hence this test is reading the array(raid5) while reshape is
interrupted, and this IO error is added in patch 12. I think
it's related however I'm not sure yet if this IO error is the reason
the test failed.

In v6.6, this test is reading across reshape position and user
might read wrong data;

In v6.8-rc1, this test will deadlock while reading across reshape
position;

In v6.8-rc1 with this patchset, reading across reshape position will
get IO error;

Can you take a look if fsck can be ran while reshape is interupted in
the test? If so, can we change the test so that fsck can only be used
while reshape is running(MD_RECOVERY_RUNNING) or after reshape is done?

Thanks,
Kuai

>>
>> -Ben
>>> Xiao Ni also test the last version on a real machine, see [1].
>>>
>>> [1] 
>>> https://lore.kernel.org/all/CALTww29QO5kzmN6Vd+jT=-8W5F52tJjHKSgrfUc1Z1ZAeRKHHA@mail.gmail.com/ 
>>>
>>>
>>> Yu Kuai (14):
>>>    md: don't ignore suspended array in md_check_recovery()
>>>    md: don't ignore read-only array in md_check_recovery()
>>>    md: make sure md_do_sync() will set MD_RECOVERY_DONE
>>>    md: don't register sync_thread for reshape directly
>>>    md: don't suspend the array for interrupted reshape
>>>    md: fix missing release of 'active_io' for flush
>>>    md: export helpers to stop sync_thread
>>>    md: export helper md_is_rdwr()
>>>    dm-raid: really frozen sync_thread during suspend
>>>    md/dm-raid: don't call md_reap_sync_thread() directly
>>>    dm-raid: add a new helper prepare_suspend() in md_personality
>>>    md/raid456: fix a deadlock for dm-raid456 while io concurrent with
>>>      reshape
>>>    dm-raid: fix lockdep waring in "pers->hot_add_disk"
>>>    dm-raid: remove mddev_suspend/resume()
>>>
>>>   drivers/md/dm-raid.c |  78 +++++++++++++++++++--------
>>>   drivers/md/md.c      | 126 +++++++++++++++++++++++++++++--------------
>>>   drivers/md/md.h      |  16 ++++++
>>>   drivers/md/raid10.c  |  16 +-----
>>>   drivers/md/raid5.c   |  61 +++++++++++----------
>>>   5 files changed, 192 insertions(+), 105 deletions(-)
>>>
>>> -- 
>>> 2.39.2
>>>
>>
>> .
>>
> 
> 
> .
>