[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b1487963-a07e-4850-98f7-0eda07c31214@molgen.mpg.de>
Date: Mon, 15 Sep 2025 09:19:39 +0200
From: Paul Menzel <pmenzel@...gen.mpg.de>
To: Kenta Akagi <k@...l.me>
Cc: Song Liu <song@...nel.org>, Yu Kuai <yukuai3@...wei.com>,
Mariusz Tkaczyk <mtkaczyk@...nel.org>, Shaohua Li <shli@...com>,
Guoqing Jiang <jgq516@...il.com>, linux-raid@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 9/9] md/raid1,raid10: Fix: Operation continuing on 0
devices.
Dear Kenta,
Thank you for your patch, and well-written commit message and test.
Reading just the summary, I didn’t know what it’s about. Should you
resend, maybe:
> md/raid1,raid10: Fix logging `Operation continuing on 0 devices.`
Am 15.09.25 um 05:42 schrieb Kenta Akagi:
> Since commit 9a567843f7ce ("md: allow last device to be forcibly
> removed from RAID1/RAID10."), RAID1/10 arrays can now lose all rdevs.
>
> Before that commit, losing the array last rdev or reaching the end of
> the function without early return in raid{1,10}_error never occurred.
> However, both situations can occur in the current implementation.
>
> As a result, when mddev->fail_last_dev is set, a spurious pr_crit
> message can be printed.
>
> This patch prevents "Operation continuing" printed if the array
> is not operational.
>
> root@...ora:~# mdadm --create --verbose /dev/md0 --level=1 \
> --raid-devices=2 /dev/loop0 /dev/loop1
> mdadm: Note: this array has metadata at the start and
> may not be suitable as a boot device. If you plan to
> store '/boot' on this device please ensure that
> your boot-loader understands md/v1.x metadata, or use
> --metadata=0.90
> mdadm: size set to 1046528K
> Continue creating array? y
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md0 started.
> root@...ora:~# echo 1 > /sys/block/md0/md/fail_last_dev
> root@...ora:~# mdadm --fail /dev/md0 loop0
> mdadm: set loop0 faulty in /dev/md0
> root@...ora:~# mdadm --fail /dev/md0 loop1
> mdadm: set device faulty failed for loop1: Device or resource busy
> root@...ora:~# dmesg | tail -n 4
> [ 1314.359674] md/raid1:md0: Disk failure on loop0, disabling device.
> md/raid1:md0: Operation continuing on 1 devices.
> [ 1315.506633] md/raid1:md0: Disk failure on loop1, disabling device.
> md/raid1:md0: Operation continuing on 0 devices.
Shouldn’t the first line of the critical log still be printed, but your
patch would remove it?
> root@...ora:~#
>
> Fixes: 9a567843f7ce ("md: allow last device to be forcibly removed from RAID1/RAID10.")
> Signed-off-by: Kenta Akagi <k@...l.me>
> ---
> drivers/md/raid1.c | 9 +++++----
> drivers/md/raid10.c | 9 +++++----
> 2 files changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index febe2849a71a..b3c845855841 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1803,6 +1803,11 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev)
> update_lastdev(conf);
> }
> set_bit(Faulty, &rdev->flags);
> + if ((conf->raid_disks - mddev->degraded) > 0)
> + pr_crit("md/raid1:%s: Disk failure on %pg, disabling device.\n"
> + "md/raid1:%s: Operation continuing on %d devices.\n",
> + mdname(mddev), rdev->bdev,
> + mdname(mddev), conf->raid_disks - mddev->degraded);
> spin_unlock_irqrestore(&conf->device_lock, flags);
> /*
> * if recovery is running, make sure it aborts.
> @@ -1810,10 +1815,6 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev)
> set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> set_mask_bits(&mddev->sb_flags, 0,
> BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING));
> - pr_crit("md/raid1:%s: Disk failure on %pg, disabling device.\n"
> - "md/raid1:%s: Operation continuing on %d devices.\n",
> - mdname(mddev), rdev->bdev,
> - mdname(mddev), conf->raid_disks - mddev->degraded);
> }
>
> static void print_conf(struct r1conf *conf)
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index be5fd77da3e1..4f3ef43ebd2a 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2059,11 +2059,12 @@ static void raid10_error(struct mddev *mddev, struct md_rdev *rdev)
> set_bit(Faulty, &rdev->flags);
> set_mask_bits(&mddev->sb_flags, 0,
> BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING));
> + if (enough(conf, -1))
> + pr_crit("md/raid10:%s: Disk failure on %pg, disabling device.\n"
> + "md/raid10:%s: Operation continuing on %d devices.\n",
> + mdname(mddev), rdev->bdev,
> + mdname(mddev), conf->geo.raid_disks - mddev->degraded);
> spin_unlock_irqrestore(&conf->device_lock, flags);
> - pr_crit("md/raid10:%s: Disk failure on %pg, disabling device.\n"
> - "md/raid10:%s: Operation continuing on %d devices.\n",
> - mdname(mddev), rdev->bdev,
> - mdname(mddev), conf->geo.raid_disks - mddev->degraded);
> }
>
> static void print_conf(struct r10conf *conf)
Kind regards,
Paul
Powered by blists - more mailing lists