lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALTww2_z7UGXJ+ppYXrkAY8bpVrV9O3z0VfoaTOZtmX1-DXiZA@mail.gmail.com>
Date: Mon, 15 Sep 2025 08:33:19 +0800
From: Xiao Ni <xni@...hat.com>
To: linan666@...weicloud.com
Cc: corbet@....net, song@...nel.org, yukuai3@...wei.com, linan122@...wei.com, 
	hare@...e.de, martin.petersen@...cle.com, bvanassche@....org, 
	filipe.c.maia@...il.com, linux-doc@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org, 
	yangerkun@...wei.com, yi.zhang@...wei.com
Subject: Re: [PATCH v4 2/2] md: allow configuring logical_block_size

Hi Nan

On Thu, Sep 11, 2025 at 3:41 PM <linan666@...weicloud.com> wrote:
>
> From: Li Nan <linan122@...wei.com>
>
> Previously, raid array used the maximum logical_block_size (LBS) of
> all member disks. Adding a larger LBS during disk at runtime could
> unexpectedly increase RAID's LBS, risking corruption of existing
> partitions.

Could you describe more about the problem? It's better to give some
test steps that can be used to reproduce this problem.
>
> Simply restricting larger-LBS disks is inflexible. In some scenarios,
> only disks with 512 LBS are available currently, but later, disks with
> 4k LBS may be added to the array.
>
> Making LBS configurable is the best way to solve this scenario.
> After this patch, the raid will:
>   - stores LBS in disk metadata.
>   - add a read-write sysfs 'mdX/logical_block_size'.
>
> Future mdadm should support setting LBS via metadata field during RAID
> creation and the new sysfs. Though the kernel allows runtime LBS changes,
> users should avoid modifying it after creating partitions or filesystems
> to prevent compatibility issues.

Because it only allows setting when creating an array. Can this be
done automatically in kernel space?

Best Regards
Xiao
>
> Note that many RAID paths rely on PAGE_SIZE alignment, including for
> metadata I/O. A logical_block_size larger than PAGE_SIZE will result in
> metadata reads/writes failures. So this config should be prevented.
>
> Only 1.x meta supports configurable logical_block_size. 0.90 meta init
> all fields to default at auto-detect. Supporting 0.90 would require more
> extensive changes and no such use case has been observed.
>
> Signed-off-by: Li Nan <linan122@...wei.com>
> ---
>  Documentation/admin-guide/md.rst |  7 +++
>  drivers/md/md.h                  |  1 +
>  include/uapi/linux/raid/md_p.h   |  3 +-
>  drivers/md/md-linear.c           |  1 +
>  drivers/md/md.c                  | 75 ++++++++++++++++++++++++++++++++
>  drivers/md/raid0.c               |  1 +
>  drivers/md/raid1.c               |  1 +
>  drivers/md/raid10.c              |  1 +
>  drivers/md/raid5.c               |  1 +
>  9 files changed, 90 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/md.rst
> index 1c2eacc94758..f5c81fad034a 100644
> --- a/Documentation/admin-guide/md.rst
> +++ b/Documentation/admin-guide/md.rst
> @@ -238,6 +238,13 @@ All md devices contain:
>       the number of devices in a raid4/5/6, or to support external
>       metadata formats which mandate such clipping.
>
> +  logical_block_size
> +     Configures the array's logical block size in bytes. This attribute
> +     is only supported for RAID1, RAID5, RAID10 with 1.x meta. The value
> +     should be written before starting the array. The final array LBS
> +     will use the max value between this configuration and all rdev's LBS.
> +     Note that LBS cannot exceed PAGE_SIZE.
> +
>    reshape_position
>       This is either ``none`` or a sector number within the devices of
>       the array where ``reshape`` is up to.  If this is set, the three
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index afb25f727409..b0147b98c8d3 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -432,6 +432,7 @@ struct mddev {
>         sector_t                        array_sectors; /* exported array size */
>         int                             external_size; /* size managed
>                                                         * externally */
> +       unsigned int                    logical_block_size;
>         __u64                           events;
>         /* If the last 'event' was simply a clean->dirty transition, and
>          * we didn't write it to the spares, then it is safe and simple
> diff --git a/include/uapi/linux/raid/md_p.h b/include/uapi/linux/raid/md_p.h
> index ac74133a4768..310068bb2a1d 100644
> --- a/include/uapi/linux/raid/md_p.h
> +++ b/include/uapi/linux/raid/md_p.h
> @@ -291,7 +291,8 @@ struct mdp_superblock_1 {
>         __le64  resync_offset;  /* data before this offset (from data_offset) known to be in sync */
>         __le32  sb_csum;        /* checksum up to devs[max_dev] */
>         __le32  max_dev;        /* size of devs[] array to consider */
> -       __u8    pad3[64-32];    /* set to 0 when writing */
> +       __le32  logical_block_size;     /* same as q->limits->logical_block_size */
> +       __u8    pad3[64-36];    /* set to 0 when writing */
>
>         /* device state information. Indexed by dev_number.
>          * 2 bytes per device
> diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c
> index 5d9b08115375..da8babb8da59 100644
> --- a/drivers/md/md-linear.c
> +++ b/drivers/md/md-linear.c
> @@ -72,6 +72,7 @@ static int linear_set_limits(struct mddev *mddev)
>
>         md_init_stacking_limits(&lim);
>         lim.max_hw_sectors = mddev->chunk_sectors;
> +       lim.logical_block_size = mddev->logical_block_size;
>         lim.max_write_zeroes_sectors = mddev->chunk_sectors;
>         lim.io_min = mddev->chunk_sectors << 9;
>         err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 40f56183c744..e0184942c8ec 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -1963,6 +1963,7 @@ static int super_1_validate(struct mddev *mddev, struct md_rdev *freshest, struc
>                 mddev->layout = le32_to_cpu(sb->layout);
>                 mddev->raid_disks = le32_to_cpu(sb->raid_disks);
>                 mddev->dev_sectors = le64_to_cpu(sb->size);
> +               mddev->logical_block_size = le32_to_cpu(sb->logical_block_size);
>                 mddev->events = ev1;
>                 mddev->bitmap_info.offset = 0;
>                 mddev->bitmap_info.space = 0;
> @@ -2172,6 +2173,7 @@ static void super_1_sync(struct mddev *mddev, struct md_rdev *rdev)
>         sb->chunksize = cpu_to_le32(mddev->chunk_sectors);
>         sb->level = cpu_to_le32(mddev->level);
>         sb->layout = cpu_to_le32(mddev->layout);
> +       sb->logical_block_size = cpu_to_le32(mddev->logical_block_size);
>         if (test_bit(FailFast, &rdev->flags))
>                 sb->devflags |= FailFast1;
>         else
> @@ -5900,6 +5902,66 @@ static struct md_sysfs_entry md_serialize_policy =
>  __ATTR(serialize_policy, S_IRUGO | S_IWUSR, serialize_policy_show,
>         serialize_policy_store);
>
> +static int mddev_set_logical_block_size(struct mddev *mddev,
> +                               unsigned int lbs)
> +{
> +       int err = 0;
> +       struct queue_limits lim;
> +
> +       if (queue_logical_block_size(mddev->gendisk->queue) >= lbs) {
> +               pr_err("%s: incompatible logical_block_size %u, can not set\n",
> +                      mdname(mddev), lbs);
> +               return -EINVAL;
> +       }
> +
> +       lim = queue_limits_start_update(mddev->gendisk->queue);
> +       lim.logical_block_size = lbs;
> +       pr_info("%s: logical_block_size is changed, data may be lost\n",
> +               mdname(mddev));
> +       err = queue_limits_commit_update(mddev->gendisk->queue, &lim);
> +       if (err)
> +               return err;
> +
> +       mddev->logical_block_size = lbs;
> +       return 0;
> +}
> +
> +static ssize_t
> +lbs_show(struct mddev *mddev, char *page)
> +{
> +       return sprintf(page, "%u\n", mddev->logical_block_size);
> +}
> +
> +static ssize_t
> +lbs_store(struct mddev *mddev, const char *buf, size_t len)
> +{
> +       unsigned int lbs;
> +       int err = -EBUSY;
> +
> +       /* Only 1.x meta supports configurable LBS */
> +       if (mddev->major_version == 0)
> +               return -EINVAL;
> +
> +       if (mddev->pers)
> +               return -EBUSY;
> +
> +       err = kstrtouint(buf, 10, &lbs);
> +       if (err < 0)
> +               return -EINVAL;
> +
> +       err = mddev_lock(mddev);
> +       if (err)
> +               goto unlock;
> +
> +       err = mddev_set_logical_block_size(mddev, lbs);
> +
> +unlock:
> +       mddev_unlock(mddev);
> +       return err ?: len;
> +}
> +
> +static struct md_sysfs_entry md_logical_block_size =
> +__ATTR(logical_block_size, S_IRUGO|S_IWUSR, lbs_show, lbs_store);
>
>  static struct attribute *md_default_attrs[] = {
>         &md_level.attr,
> @@ -5933,6 +5995,7 @@ static struct attribute *md_redundancy_attrs[] = {
>         &md_scan_mode.attr,
>         &md_last_scan_mode.attr,
>         &md_mismatches.attr,
> +       &md_logical_block_size.attr,
>         &md_sync_min.attr,
>         &md_sync_max.attr,
>         &md_sync_io_depth.attr,
> @@ -6052,6 +6115,17 @@ int mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim,
>                         return -EINVAL;
>         }
>
> +       /*
> +        * Before RAID adding folio support, the logical_block_size
> +        * should be smaller than the page size.
> +        */
> +       if (lim->logical_block_size > PAGE_SIZE) {
> +               pr_err("%s: logical_block_size must not larger than PAGE_SIZE\n",
> +                       mdname(mddev));
> +               return -EINVAL;
> +       }
> +       mddev->logical_block_size = lim->logical_block_size;
> +
>         return 0;
>  }
>  EXPORT_SYMBOL_GPL(mddev_stack_rdev_limits);
> @@ -6690,6 +6764,7 @@ static void md_clean(struct mddev *mddev)
>         mddev->chunk_sectors = 0;
>         mddev->ctime = mddev->utime = 0;
>         mddev->layout = 0;
> +       mddev->logical_block_size = 0;
>         mddev->max_disks = 0;
>         mddev->events = 0;
>         mddev->can_decrease_events = 0;
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index f1d8811a542a..705889a09fc1 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -382,6 +382,7 @@ static int raid0_set_limits(struct mddev *mddev)
>         md_init_stacking_limits(&lim);
>         lim.max_hw_sectors = mddev->chunk_sectors;
>         lim.max_write_zeroes_sectors = mddev->chunk_sectors;
> +       lim.logical_block_size = mddev->logical_block_size;
>         lim.io_min = mddev->chunk_sectors << 9;
>         lim.io_opt = lim.io_min * mddev->raid_disks;
>         lim.chunk_sectors = mddev->chunk_sectors;
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index d0f6afd2f988..de0c843067dc 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -3223,6 +3223,7 @@ static int raid1_set_limits(struct mddev *mddev)
>
>         md_init_stacking_limits(&lim);
>         lim.max_write_zeroes_sectors = 0;
> +       lim.logical_block_size = mddev->logical_block_size;
>         lim.features |= BLK_FEAT_ATOMIC_WRITES;
>         err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
>         if (err)
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index c3cfbb0347e7..68c8148386b0 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -4005,6 +4005,7 @@ static int raid10_set_queue_limits(struct mddev *mddev)
>
>         md_init_stacking_limits(&lim);
>         lim.max_write_zeroes_sectors = 0;
> +       lim.logical_block_size = mddev->logical_block_size;
>         lim.io_min = mddev->chunk_sectors << 9;
>         lim.chunk_sectors = mddev->chunk_sectors;
>         lim.io_opt = lim.io_min * raid10_nr_stripes(conf);
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index c32ffd9cffce..ff0daa22df65 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -7747,6 +7747,7 @@ static int raid5_set_limits(struct mddev *mddev)
>         stripe = roundup_pow_of_two(data_disks * (mddev->chunk_sectors << 9));
>
>         md_init_stacking_limits(&lim);
> +       lim.logical_block_size = mddev->logical_block_size;
>         lim.io_min = mddev->chunk_sectors << 9;
>         lim.io_opt = lim.io_min * (conf->raid_disks - conf->max_degraded);
>         lim.features |= BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE;
> --
> 2.39.2
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ