lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ilb5dqq1.fsf@metaspace.dk>
Date:   Fri, 30 Jun 2023 13:53:58 +0200
From:   "Andreas Hindborg (Samsung)" <nmi@...aspace.dk>
To:     Damien Le Moal <dlemoal@...nel.org>
Cc:     Ming Lei <ming.lei@...hat.com>,
        Hans Holmberg <Hans.Holmberg@....com>,
        Aravind Ramesh <Aravind.Ramesh@....com>,
        Jens Axboe <axboe@...nel.dk>,
        "open list:BLOCK LAYER" <linux-block@...r.kernel.org>,
        Christoph Hellwig <hch@...radead.org>,
        Matias Bjorling <Matias.Bjorling@....com>,
        open list <linux-kernel@...r.kernel.org>, gost.dev@...sung.com,
        Minwoo Im <minwoo.im.dev@...il.com>
Subject: Re: [PATCH v4 3/4] ublk: enable zoned storage support


Damien Le Moal <dlemoal@...nel.org> writes:

> On 6/29/23 16:50, Andreas Hindborg (Samsung) wrote:
>>>>  UCLINUX (M68KNOMMU AND COLDFIRE)
>>>> diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
>>>> index 5b9d4aaebb81..c58dfd035557 100644
>>>> --- a/drivers/block/Kconfig
>>>> +++ b/drivers/block/Kconfig
>>>> @@ -402,6 +402,10 @@ config BLKDEV_UBLK_LEGACY_OPCODES
>>>>  	  suggested to enable N if your application(ublk server) switches to
>>>>  	  ioctl command encoding.
>>>>  
>>>> +config BLK_DEV_UBLK_ZONED
>>>> +	def_bool y
>>>
>>> This can be "bool" only.
>> 
>> I did it like this because I wanted BLK_DEV_UBLK_ZONED to be set
>> automatically to "y" when BLK_DEV_UBLK and BLK_DEV_ZONED are enabled.
>> BLK_DEV_UBLK_ZONED has no menu entry. If we change it to "bool" it will
>> be default "n" and we would have to make it appear in menuconfig to be
>> manually enabled.
>> 
>> Isn't it more sane if it is just unconditionally enabled when
>> BLK_DEV_ZONED and BLK_DEV_UBLK is enabled?
>
> Maybe something like this then:
>
> diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
> index 5b9d4aaebb81..105e1121bf5f 100644
> --- a/drivers/block/Kconfig
> +++ b/drivers/block/Kconfig
> @@ -370,9 +370,13 @@ config BLK_DEV_RBD
>
>           If unsure, say N.
>
> +config BLK_DEV_UBLK_ZONED
> +       bool
> +
>  config BLK_DEV_UBLK
>         tristate "Userspace block driver (Experimental)"
>         select IO_URING
> +       select BLK_DEV_UBLK_ZONED if BLK_DEV_ZONED
>         help
>           io_uring based userspace block driver. Together with ublk server, ublk
>           has been working well, but interface with userspace or command data

OK 👍

>>>> +		ub->ub_disk->nr_zones = p->dev_sectors / p->chunk_sectors;
>>>> +}
>>>> +
>>>> +void ublk_dev_param_zoned_apply(struct ublk_device *ub)
>>>> +{
>>>> +	const struct ublk_param_zoned *p = &ub->params.zoned;
>>>> +
>>>> +	if (ub->dev_info.flags & UBLK_F_ZONED) {
>>>> +		disk_set_max_active_zones(ub->ub_disk, p->max_active_zones);
>>>> +		disk_set_max_open_zones(ub->ub_disk, p->max_open_zones);
>>>
>>> You do not need to check if the max_active_zones and max_open_zones values are
>>> sensible ? E.g. what if they are larger than the number of zones ?
>> 
>> I don't think it will be a problem. I assume you can set them to 0 to
>> indicate infinite. If user space sets them to more than the actual
>> number of available zones (that user space set up also), user space will
>> have to handle that when it gets a zone request for a zone outside the
>> logical drive LBA space. I don't think the kernel has to care. Will this
>> break anything kernel side?
>
> zonefs and btrfs look at these limits. So I would rather prefer seeing sensible
> values. 0 == no limit is fine. But seeing maz or moz > nr_zones for the device
> is just too strange and I would prefer an error in that case to let the user
> know it is not doing something right. Getting moz > maz is also nonsense that
> needs to be checked. There is one case we could silently change: if maz ==
> nr_zones or moz == nr_zones, then that essentially means "no limit", so we could
> use 0 to let the in-kernel users that they do not need to care about the limits.

OK, will check.

>
>> 
>>>
>>>> +	}
>>>> +}
>>>> +
>>>> +int ublk_revalidate_disk_zones(struct gendisk *disk)
>>>> +{
>>>> +	return blk_revalidate_disk_zones(disk, NULL);
>>>> +}
>>>
>>> I do not think this helper is needed at all (see below comment on the call site).
>> 
>> This is required because the prototype for `blk_revalidate_disk_zones()`
>> is not defined when BLK_DEV_ZONED is not defined. Without this helper
>> for which the prototype is always defined, we will have a compile error
>> at the call site.
>
> You have this hunk in ublk_ctrl_start_dev:
>
> disk_set_zoned(disk, BLK_ZONED_HM);
> blk_queue_required_elevator_features(disk->queue, ELEVATOR_F_ZBD_SEQ_WRITE);
> ret = ublk_revalidate_disk_zones(disk);
> if (ret)
> 	goto out_put_disk;
>
> And that is called under "if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)". So I do not see
> how you can get compile errors using directly blk_revalidate_disk_zones(). The
> entire hunk above should be a helper function I think.

The prototype for `disk_set_zoned()` is defined independent of
CONFIG_BLK_DEV_ZONED. The prototype for `blk_revalidate_disk_zones()` is
only defined when CONFIG_BLK_DEV_ZONED is enabled. It is not the same.
We can't have a call to `blk_revalidate_disk_zones()` in a translation
unit when the prototype is not defined, even behind a `if
(IS_ENABLED())` guard. It would be fine behind an `#ifdef` though.

If we move the entire hunk to a helper for which the prototype is always
defined, we should be able to call `blk_revalidate_disk_zones()`
directly 👍 I'll see if I can do that.

>>>> +
>>>> +	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
>>>> +		ublk_set_nr_zones(ub);
>>>
>>> So if the user is attempting to setup a zoned drive but the kernel does not have
>>> CONFIG_BLK_DEV_ZONED=y, the user setup will be silently ignored. Not exactly
>>> nice I think, unless I am missing something.
>> 
>> We have this in `ublk_ctrl_add_dev()`:
>> 
>> 	if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED))
>> 		ub->dev_info.flags &= ~UBLK_F_ZONED;
>> 
>> User space is supposed to check the flags after the call to know the
>> kernel capabilities.
>
> And you trust user space to be always correct ? I never do :)
> So definitely, check and error if not supported. No silently clearing the device
> zoned flag.
>

I'll make it fail instead of just clearing the flag 👍

>
>>> Also, repeating that "if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))" for all zone
>>> related functions is very verbose. Stub the functions in ublk_drv.h. That will
>>> make the main C code lighter.
>> 
>> Not sure what you mean "stub the functions". Like so:
>> 
>> @@ -216,8 +216,7 @@ static void ublk_dev_param_basic_apply(struct ublk_device *ub)
>>  
>>  	set_capacity(ub->ub_disk, p->dev_sectors);
>>  
>> -	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
>> -		ublk_set_nr_zones(ub);
>> +	ublk_config_nr_zones(ub);
>>  }
>> 
>> And then in the header:
>> 
>> void ublk_config_nr_zones(struct ublk_device *ub)
>> {
>> 	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
>> 		ublk_set_nr_zones(ub);
>> }
>> 
>> Or something else?
>
> #ifndef CONFIG_BLK_DEV_ZONED
> static inline void ublk_set_nr_zones() {}
> #endif
>
> That avoid repeating that if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) all over the
> place. The stubbed functions can be called only for "if (ub->dev_info.flags &
> UBLK_F_ZONED", which is OK since this flag is never supposed to be accepted and
> set for the CONFIG_BLK_DEV_ZONED=n case.

Alright 👍

>
>>>> +		disk_set_zoned(disk, BLK_ZONED_HM);
>>>> +		blk_queue_required_elevator_features(disk->queue, ELEVATOR_F_ZBD_SEQ_WRITE);
>>>> +		ret = ublk_revalidate_disk_zones(disk);
>>>> +		if (ret)
>>>> +			goto out_put_disk;
>>>
>>> This should be all a helper ublk_set_zoned() or something.
>> 
>> Ok 👍 Unfortunately this block of code is split up in the zone append
>> patch. I will see what I can do, maybe 2 functions.
>
> Beware that I just posted patches that require zone size (chunk_sectors limit)
> and max append sectors limit to be set *before* calling
> blk_revalidate_disk_zones(). Otherwise, you will get an error.

Got it, thanks.

>
>>>>  	get_device(&ub->cdev_dev);
>>>>  	ub->dev_info.state = UBLK_S_DEV_LIVE;
>>>>  	ret = add_disk(disk);
>>>> @@ -1997,6 +2046,9 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
>>>>  	if (ub->dev_info.flags & UBLK_F_USER_COPY)
>>>>  		ub->dev_info.flags &= ~UBLK_F_NEED_GET_DATA;
>>>>  
>>>> +	if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED))
>>>> +		ub->dev_info.flags &= ~UBLK_F_ZONED;
>>>
>>> Arg, no. The user should be notified with an error that he/she is attempting to
>>> create a zoned device that cannot be supported.
>> 
>> As I wrote above, this is part of the ublk uapi. This is the way user
>> space does kernel capability check. User space will check the flags when
>> this call returns. @Ming did I understand this correctly or did I miss something?
>
> OK. So this is like a virtio capability flags thing ? Negotiation between device
> and driver ?

I'm not familiar with that one, but I think you are right we should fail
the request here. I makes no sense to configure a regular ublk device if
user space wants to set up a zoned one. Failing is more sensible.

>
>>>> diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
>>>> index 471b3b983045..436525afffe8 100644
>>>> --- a/include/uapi/linux/ublk_cmd.h
>>>> +++ b/include/uapi/linux/ublk_cmd.h
>>>> @@ -176,6 +176,11 @@
>>>>  /* Copy between request and user buffer by pread()/pwrite() */
>>>>  #define UBLK_F_USER_COPY	(1UL << 7)
>>>>  
>>>> +/*
>>>> + * Enable zoned device support
>>>
>>> Isn't this for "Indicate that the device is zoned" ?
>> 
>> User space sets this flag when setting up the device to enable zoned
>> storage support. Kernel may reject it. I think the wording is
>> sufficient. Could be updated to:
>> 
>> "User space sets this flag when setting up the device to request zoned
>>  storage support. Kernel may deny the request by clearing the flag
>>  during setup."
>
> I really have a big problem with this. If the user driver does not check that
> the flag was cleared, the user will assume that the device is zoned, but it is
> not. This is not super nice... I would really prefer just failing the devie
> creation if zoned was requested but cannot be satisfied, either because the
> kernel does not support zoned block devices or when the user specified something
> nonsensical.

Changed to:

/*
 * User space sets this flag when setting up the device to request zoned storage support. Kernel may
 * deny the request by returning an error.
 */
#define UBLK_F_ZONED (1ULL << 8)

Best regards,
Andreas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ