[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <BY5PR21MB1506A8B3AA10BDC0CC87A399CE119@BY5PR21MB1506.namprd21.prod.outlook.com>
Date: Fri, 16 Jul 2021 21:19:37 +0000
From: Long Li <longli@...rosoft.com>
To: Michael Kelley <mikelley@...rosoft.com>,
"longli@...uxonhyperv.com" <longli@...uxonhyperv.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>
CC: Jonathan Corbet <corbet@....net>,
KY Srinivasan <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Stephen Hemminger <sthemmin@...rosoft.com>,
Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Bjorn Andersson <bjorn.andersson@...aro.org>,
Hans de Goede <hdegoede@...hat.com>,
"Williams, Dan J" <dan.j.williams@...el.com>,
Maximilian Luz <luzmaximilian@...il.com>,
Mike Rapoport <rppt@...nel.org>,
Ben Widawsky <ben.widawsky@...el.com>,
Jiri Slaby <jirislaby@...nel.org>,
Andra Paraschiv <andraprs@...zon.com>,
Siddharth Gupta <sidgup@...eaurora.org>,
Hannes Reinecke <hare@...e.de>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>
Subject: RE: [Patch v3 2/3] Drivers: hv: add Azure Blob driver
> Subject: RE: [Patch v3 2/3] Drivers: hv: add Azure Blob driver
>
> From: Long Li <longli@...rosoft.com> Sent: Friday, July 16, 2021 12:27 PM
>
> [snip]
>
> > > > +
> > > > +static int az_blob_connect_to_vsp(struct hv_device *device, u32
> > > > +ring_size) {
> > > > + int ret;
> > > > +
> > > > + spin_lock_init(&az_blob_dev.file_lock);
> > > > + INIT_LIST_HEAD(&az_blob_dev.file_list);
> > > > + init_waitqueue_head(&az_blob_dev.file_wait);
> > > > +
> > > > + az_blob_dev.device = device;
> > > > + device->channel->rqstor_size = AZ_BLOB_QUEUE_DEPTH;
> > > > +
> > > > + ret = vmbus_open(device->channel, ring_size, ring_size, NULL, 0,
> > > > + az_blob_on_channel_callback, device->channel);
> > > > +
> > > > + if (ret)
> > > > + return ret;
> > > > +
> > > > + hv_set_drvdata(device, &az_blob_dev);
> > > > +
> > > > + return ret;
> > > > +}
> > > > +
> > > > +static void az_blob_remove_vmbus(struct hv_device *device) {
> > > > + /* At this point, no VSC/VSP traffic is possible over vmbus */
> > > > + hv_set_drvdata(device, NULL);
> > > > + vmbus_close(device->channel);
> > > > +}
> > > > +
> > > > +static int az_blob_probe(struct hv_device *device,
> > > > + const struct hv_vmbus_device_id *dev_id) {
> > > > + int ret;
> > > > +
> > > > + ret = az_blob_connect_to_vsp(device, AZ_BLOB_RING_SIZE);
> > > > + if (ret) {
> > > > + az_blob_err("error connecting to VSP ret=%d\n", ret);
> > > > + return ret;
> > > > + }
> > > > +
> > > > + // create user-mode client library facing device
> > > > + ret = az_blob_create_device(&az_blob_dev);
> > > > + if (ret) {
> > > > + az_blob_err("failed to create device ret=%d\n", ret);
> > > > + az_blob_remove_vmbus(device);
> > > > + return ret;
> > > > + }
> > > > +
> > > > + az_blob_dev.removing = false;
> > >
> > > This line seems misplaced. As soon as az_blob_create_device()
> > > returns, some other thread could find the device and open it. You
> > > don't want the
> > > az_blob_fop_open() function to find the "removing"
> > > flag set to true. So I think this line should go *before* the call
> > > to az_blob_create_device().
> > >
> > > > + az_blob_info("successfully probed device\n");
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static int az_blob_remove(struct hv_device *dev) {
> > > > + az_blob_dev.removing = true;
> > > > + /*
> > > > + * RCU lock guarantees that any future calls to az_blob_fop_open()
> > > > + * can not use device resources while the inode reference of
> > > > + * /dev/azure_blob is being held for that open, and device file is
> > > > + * being removed from /dev.
> > > > + */
> > > > + synchronize_rcu();
> > >
> > > I don't think this works as you have described. While it will
> > > ensure that any in-progress RCU read-side critical sections have
> > > completed (i.e., in
> > > az_blob_fop_open() ), it does not prevent new read-side critical
> > > sections from being started. So as soon as synchronize_rcu() is
> > > finished, another thread could find and open the device, and be
> > > executing in az_blob_fop_open().
> > >
> > > But it's not clear to me that this (and the rcu_read_locks in
> > > az_blob_fop_open) are really needed anyway. If
> > > az_blob_remove_device() is called while one or more threads have it
> > > open, I think that's OK. Or is there a scenario that I'm missing?
> >
> > I was trying to address your comment earlier. Here were your comments (1
> - 7):
> >
> > 1) The driver is unbound, so az_blob_remove() is called.
> > 2) az_blob_remove() sets the "removing" flag to true, and calls
> az_blob_remove_device().
> > 3) az_blob_remove_device() waits for the file_list to become empty.
> > 4) After the file_list becomes empty, but before misc_deregister() is called,
> a separate thread opens the device again.
> > 5) In the separate thread, az_blob_fop_open() obtains the file_lock spin
> lock.
> > 6) Before az_blob_fop_open() releases the spin lock, az
> > block_remove_device() completes, along with az_blob_remove().
> > 7) Then the device gets rebound, and az_blob_connect_to_vsp() gets
> > called, all while az_blob_fop_open() still holds the spin lock. So the spin
> lock get re-initialized while it is held.
> >
> > Between step 4 and step 5, I don't see any guarantee that
> > az_blob_fop_open() can't run concurrently on another CPU after
> > misc_deregister() finishes. misc_deregister() calls
> > devtmpfs_delete_node() to remove the device file from /dev/*, but it
> doesn't check the return value, so the inode reference number can be non-
> zero after it returns, somebody may still try to open it.
>
> I'm no expert here, but once misc_deregister() finishes, I would expect that
> any attempt to open the device with the assigned major:minor number will
> fail. However, existing opens may still be active. So another thread could
> still
> have it open, but no new opens can be done. As coded in this version of
> your patch, az_blob_remove() waits until all those existing opens have been
> closed,
> so everything is clean once the wait is complete. Of course, while waiting
> here, the threads with the open fd could try to start another operation
> against the VSP, but the "removing" flag will prevent that. So the only access
> these threads will have is to the singleton az_blob_dev instance and the list
> of open files.
>
> But as noted previously, waiting here for the opens to be closed is
> problematic because the wait could be arbitrarily long. And there's
> messiness if the device gets re-added later, because there's no distinction
> between the old instantiation that a thread might still have open vs. the new
> instantiation that you want to initialize fresh. That's where creating a new
> dynamic instance will solve any problems.
As I said in my previous email, I'm moving to dynamically allocated objects to fix all of these.
>
> > This check guarantees that the code can't reference any driver's internal
> data structures.
> > az_blob_dev.removing is set so this code can't be entered. Resetting
> > it after az_blob_create_device() is also for this purpose.
> >
> > >
> > > > +
> > > > + az_blob_info("removing device\n");
> > > > + az_blob_remove_device();
> > > > +
> > > > + /*
> > > > + * At this point, it's not possible to open more files.
> > > > + * Wait for all the opened files to be released.
> > > > + */
> > > > + wait_event(az_blob_dev.file_wait,
> > > > +list_empty(&az_blob_dev.file_list));
> > >
> > > As mentioned in my most recent comments on the previous version of
> > > the code, I'm concerned about waiting for all open files to be
> > > released in the case of a VMbus rescind. We definitely have to wait
> > > for all VSP operations to complete, but that's different from
> > > waiting for the files to be closed. The former depends on Hyper-V
> > > being well-behaved and will presumably happen quickly in the case of
> > > a rescind. But the latter depends entirely on user space code that
> > > is out of the kernel's control. The user space process could hang
> > > around for hours or days with the file still open, which would block this
> function from completing, and hence block the global work queue.
> > >
> > > Thinking about this, the core issue may be that having a single
> > > static instance of az_blob_device is problematic if we allow the
> > > device to be removed (either explicitly with an unbind, or
> > > implicitly with a VMbus
> > > rescind) and then re-added. Perhaps what needs to happen is that
> > > the removed device is allowed to continue to exist as long as user
> > > space processes have an open file handle, but they can't perform any
> > > operations because the "removing" flag is set (and stays set).
> > > If the device is re-added, then a new instance of az_blob_device
> > > should be created, and whether or not the old instance is still
> > > hanging around is irrelevant.
> >
> > I agree dynamic device objects is the way to go. Will implement this.
> >
> > >
> > > > +
> > > > + az_blob_remove_vmbus(dev);
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static struct hv_driver az_blob_drv = {
> > > > + .name = KBUILD_MODNAME,
> > > > + .id_table = id_table,
> > > > + .probe = az_blob_probe,
> > > > + .remove = az_blob_remove,
> > > > + .driver = {
> > > > + .probe_type = PROBE_PREFER_ASYNCHRONOUS,
> > > > + },
> > > > +};
> > > > +
> > > > +static int __init az_blob_drv_init(void) {
> > > > + return vmbus_driver_register(&az_blob_drv);
> > > > +}
> > > > +
> > > > +static void __exit az_blob_drv_exit(void) {
> > > > + vmbus_driver_unregister(&az_blob_drv);
> > > > +}
> > > > +
> > > > +MODULE_LICENSE("Dual BSD/GPL");
> > > > +MODULE_DESCRIPTION("Microsoft Azure Blob driver");
> > > > +module_init(az_blob_drv_init); module_exit(az_blob_drv_exit);
> > > > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> > > > index 705e95d..3095611 100644
> > > > --- a/drivers/hv/channel_mgmt.c
> > > > +++ b/drivers/hv/channel_mgmt.c
> > > > @@ -154,6 +154,13 @@
> > > > .allowed_in_isolated = false,
> > > > },
> > > >
> > > > + /* Azure Blob */
> > > > + { .dev_type = HV_AZURE_BLOB,
> > > > + HV_AZURE_BLOB_GUID,
> > > > + .perf_device = false,
> > > > + .allowed_in_isolated = false,
> > > > + },
> > > > +
> > > > /* Unknown GUID */
> > > > { .dev_type = HV_UNKNOWN,
> > > > .perf_device = false,
> > > > diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index
> > > > d1e59db..ac31362 100644
> > > > --- a/include/linux/hyperv.h
> > > > +++ b/include/linux/hyperv.h
> > > > @@ -772,6 +772,7 @@ enum vmbus_device_type {
> > > > HV_FCOPY,
> > > > HV_BACKUP,
> > > > HV_DM,
> > > > + HV_AZURE_BLOB,
> > > > HV_UNKNOWN,
> > > > };
> > > >
> > > > @@ -1350,6 +1351,14 @@ int vmbus_allocate_mmio(struct resource
> > > **new, struct hv_device *device_obj,
> > > > 0x72, 0xe2, 0xff, 0xb1, 0xdc, 0x7f)
> > > >
> > > > /*
> > > > + * Azure Blob GUID
> > > > + * {0590b792-db64-45cc-81db-b8d70c577c9e}
> > > > + */
> > > > +#define HV_AZURE_BLOB_GUID \
> > > > + .guid = GUID_INIT(0x0590b792, 0xdb64, 0x45cc, 0x81, 0xdb, \
> > > > + 0xb8, 0xd7, 0x0c, 0x57, 0x7c, 0x9e)
> > > > +
> > > > +/*
> > > > * Shutdown GUID
> > > > * {0e0b6031-5213-4934-818b-38d90ced39db}
> > > > */
> > > > diff --git a/include/uapi/misc/azure_blob.h
> > > > b/include/uapi/misc/azure_blob.h new file mode 100644 index
> > > > 0000000..f4168679
> > > > --- /dev/null
> > > > +++ b/include/uapi/misc/azure_blob.h
> > > > @@ -0,0 +1,34 @@
> > > > +/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0-only WITH
> > > > +Linux-syscall-note */
> > > > +/* Copyright (c) Microsoft Corporation. */
> > > > +
> > > > +#ifndef _AZ_BLOB_H
> > > > +#define _AZ_BLOB_H
> > > > +
> > > > +#include <linux/kernel.h>
> > > > +#include <linux/uuid.h>
> > > > +
> > > > +/* user-mode sync request sent through ioctl */ struct
> > > > +az_blob_request_sync_response {
> > > > + __u32 status;
> > > > + __u32 response_len;
> > > > +};
> > > > +
> > > > +struct az_blob_request_sync {
> > > > + guid_t guid;
> > > > + __u32 timeout;
> > > > + __u32 request_len;
> > > > + __u32 response_len;
> > > > + __u32 data_len;
> > > > + __u32 data_valid;
> > > > + __aligned_u64 request_buffer;
> > >
> > > Is there an implied 32-bit padding field before "request_buffer"?
> > > It seems like "yes", since there are five 32-bit field preceding.
> > > Would it be better to explicitly list that padding field?
> > >
> > > > + __aligned_u64 response_buffer;
> > > > + __aligned_u64 data_buffer;
> > > > + struct az_blob_request_sync_response response; };
> > > > +
> > > > +#define AZ_BLOB_MAGIC_NUMBER 'R'
> > > > +#define IOCTL_AZ_BLOB_DRIVER_USER_REQUEST \
> > > > + _IOWR(AZ_BLOB_MAGIC_NUMBER, 0xf0, \
> > > > + struct az_blob_request_sync)
> > > > +
> > > > +#endif /* define _AZ_BLOB_H */
> > > > --
> > > > 1.8.3.1
Powered by blists - more mailing lists