lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <BY5PR21MB1506A8B3AA10BDC0CC87A399CE119@BY5PR21MB1506.namprd21.prod.outlook.com>
Date:   Fri, 16 Jul 2021 21:19:37 +0000
From:   Long Li <longli@...rosoft.com>
To:     Michael Kelley <mikelley@...rosoft.com>,
        "longli@...uxonhyperv.com" <longli@...uxonhyperv.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>
CC:     Jonathan Corbet <corbet@....net>,
        KY Srinivasan <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        Hans de Goede <hdegoede@...hat.com>,
        "Williams, Dan J" <dan.j.williams@...el.com>,
        Maximilian Luz <luzmaximilian@...il.com>,
        Mike Rapoport <rppt@...nel.org>,
        Ben Widawsky <ben.widawsky@...el.com>,
        Jiri Slaby <jirislaby@...nel.org>,
        Andra Paraschiv <andraprs@...zon.com>,
        Siddharth Gupta <sidgup@...eaurora.org>,
        Hannes Reinecke <hare@...e.de>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>
Subject: RE: [Patch v3 2/3] Drivers: hv: add Azure Blob driver

> Subject: RE: [Patch v3 2/3] Drivers: hv: add Azure Blob driver
> 
> From: Long Li <longli@...rosoft.com> Sent: Friday, July 16, 2021 12:27 PM
> 
> [snip]
> 
> > > > +
> > > > +static int az_blob_connect_to_vsp(struct hv_device *device, u32
> > > > +ring_size) {
> > > > +	int ret;
> > > > +
> > > > +	spin_lock_init(&az_blob_dev.file_lock);
> > > > +	INIT_LIST_HEAD(&az_blob_dev.file_list);
> > > > +	init_waitqueue_head(&az_blob_dev.file_wait);
> > > > +
> > > > +	az_blob_dev.device = device;
> > > > +	device->channel->rqstor_size = AZ_BLOB_QUEUE_DEPTH;
> > > > +
> > > > +	ret = vmbus_open(device->channel, ring_size, ring_size, NULL, 0,
> > > > +			 az_blob_on_channel_callback, device->channel);
> > > > +
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	hv_set_drvdata(device, &az_blob_dev);
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +static void az_blob_remove_vmbus(struct hv_device *device) {
> > > > +	/* At this point, no VSC/VSP traffic is possible over vmbus */
> > > > +	hv_set_drvdata(device, NULL);
> > > > +	vmbus_close(device->channel);
> > > > +}
> > > > +
> > > > +static int az_blob_probe(struct hv_device *device,
> > > > +			 const struct hv_vmbus_device_id *dev_id) {
> > > > +	int ret;
> > > > +
> > > > +	ret = az_blob_connect_to_vsp(device, AZ_BLOB_RING_SIZE);
> > > > +	if (ret) {
> > > > +		az_blob_err("error connecting to VSP ret=%d\n", ret);
> > > > +		return ret;
> > > > +	}
> > > > +
> > > > +	// create user-mode client library facing device
> > > > +	ret = az_blob_create_device(&az_blob_dev);
> > > > +	if (ret) {
> > > > +		az_blob_err("failed to create device ret=%d\n", ret);
> > > > +		az_blob_remove_vmbus(device);
> > > > +		return ret;
> > > > +	}
> > > > +
> > > > +	az_blob_dev.removing = false;
> > >
> > > This line seems misplaced.  As soon as az_blob_create_device()
> > > returns, some other thread could find the device and open it.  You
> > > don't want the
> > > az_blob_fop_open() function to find the "removing"
> > > flag set to true.  So I think this line should go *before* the call
> > > to az_blob_create_device().
> > >
> > > > +	az_blob_info("successfully probed device\n");
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static int az_blob_remove(struct hv_device *dev) {
> > > > +	az_blob_dev.removing = true;
> > > > +	/*
> > > > +	 * RCU lock guarantees that any future calls to az_blob_fop_open()
> > > > +	 * can not use device resources while the inode reference of
> > > > +	 * /dev/azure_blob is being held for that open, and device file is
> > > > +	 * being removed from /dev.
> > > > +	 */
> > > > +	synchronize_rcu();
> > >
> > > I don't think this works as you have described.  While it will
> > > ensure that any in-progress RCU read-side critical sections have
> > > completed (i.e., in
> > > az_blob_fop_open() ), it does not prevent new read-side critical
> > > sections from being started.  So as soon as synchronize_rcu() is
> > > finished, another thread could find and open the device, and be
> > > executing in az_blob_fop_open().
> > >
> > > But it's not clear to me that this (and the rcu_read_locks in
> > > az_blob_fop_open) are really needed anyway.  If
> > > az_blob_remove_device() is called while one or more threads have it
> > > open, I think that's OK.  Or is there a scenario that I'm missing?
> >
> > I was trying to address your comment earlier. Here were your comments (1
> - 7):
> >
> > 1) The driver is unbound, so az_blob_remove() is called.
> > 2) az_blob_remove() sets the "removing" flag to true, and calls
> az_blob_remove_device().
> > 3) az_blob_remove_device() waits for the file_list to become empty.
> > 4) After the file_list becomes empty, but before misc_deregister() is called,
> a separate thread opens the device again.
> > 5) In the separate thread, az_blob_fop_open() obtains the file_lock spin
> lock.
> > 6) Before az_blob_fop_open() releases the spin lock, az
> > block_remove_device() completes, along with az_blob_remove().
> > 7) Then the device gets rebound, and az_blob_connect_to_vsp() gets
> > called, all while az_blob_fop_open() still holds the spin lock.  So the spin
> lock get re-initialized while it is held.
> >
> > Between step 4 and step 5, I don't see any guarantee that
> > az_blob_fop_open() can't run concurrently on another CPU after
> > misc_deregister() finishes. misc_deregister() calls
> > devtmpfs_delete_node() to remove the device file from /dev/*, but it
> doesn't check the return value, so the inode reference number can be non-
> zero after it returns, somebody may still try to open it.
> 
> I'm no expert here, but once misc_deregister() finishes, I would expect that
> any attempt to open the device with the assigned major:minor number will
> fail.  However, existing opens may still be active.   So another thread could
> still
> have it open, but no new opens can be done.  As coded in this version of
> your patch, az_blob_remove() waits until all those existing opens have been
> closed,
> so everything is clean once the wait is complete.   Of course, while waiting
> here, the threads with the open fd could try to start another operation
> against the VSP, but the "removing" flag will prevent that.  So the only access
> these threads will have is to the singleton az_blob_dev instance and the list
> of open files.
> 
> But as noted previously, waiting here for the opens to be closed is
> problematic because the wait could be arbitrarily long.  And there's
> messiness if the device gets re-added later, because there's no distinction
> between the old instantiation that a thread might still have open vs. the new
> instantiation that you want to initialize fresh.  That's where creating a new
> dynamic instance will solve any problems.

As I said in my previous email, I'm moving to dynamically allocated objects to fix all of these.

> 
> > This check guarantees that the code can't reference any driver's internal
> data structures.
> > az_blob_dev.removing is set so this code can't be entered. Resetting
> > it after az_blob_create_device() is also for this purpose.
> >
> > >
> > > > +
> > > > +	az_blob_info("removing device\n");
> > > > +	az_blob_remove_device();
> > > > +
> > > > +	/*
> > > > +	 * At this point, it's not possible to open more files.
> > > > +	 * Wait for all the opened files to be released.
> > > > +	 */
> > > > +	wait_event(az_blob_dev.file_wait,
> > > > +list_empty(&az_blob_dev.file_list));
> > >
> > > As mentioned in my most recent comments on the previous version of
> > > the code, I'm concerned about waiting for all open files to be
> > > released in the case of a VMbus rescind.  We definitely have to wait
> > > for all VSP operations to complete, but that's different from
> > > waiting for the files to be closed.  The former depends on Hyper-V
> > > being well-behaved and will presumably happen quickly in the case of
> > > a rescind.  But the latter depends entirely on user space code that
> > > is out of the kernel's control.  The user space process could hang
> > > around for hours or days with the file still open, which would block this
> function from completing, and hence block the global work queue.
> > >
> > > Thinking about this, the core issue may be that having a single
> > > static instance of az_blob_device is problematic if we allow the
> > > device to be removed (either explicitly with an unbind, or
> > > implicitly with a VMbus
> > > rescind) and then re-added.  Perhaps what needs to happen is that
> > > the removed device is allowed to continue to exist as long as user
> > > space processes have an open file handle, but they can't perform any
> > > operations because the "removing" flag is set (and stays set).
> > > If the device is re-added, then a new instance of az_blob_device
> > > should be created, and whether or not the old instance is still
> > > hanging around is irrelevant.
> >
> > I agree dynamic device objects is the way to go. Will implement this.
> >
> > >
> > > > +
> > > > +	az_blob_remove_vmbus(dev);
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static struct hv_driver az_blob_drv = {
> > > > +	.name = KBUILD_MODNAME,
> > > > +	.id_table = id_table,
> > > > +	.probe = az_blob_probe,
> > > > +	.remove = az_blob_remove,
> > > > +	.driver = {
> > > > +		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
> > > > +	},
> > > > +};
> > > > +
> > > > +static int __init az_blob_drv_init(void) {
> > > > +	return vmbus_driver_register(&az_blob_drv);
> > > > +}
> > > > +
> > > > +static void __exit az_blob_drv_exit(void) {
> > > > +	vmbus_driver_unregister(&az_blob_drv);
> > > > +}
> > > > +
> > > > +MODULE_LICENSE("Dual BSD/GPL");
> > > > +MODULE_DESCRIPTION("Microsoft Azure Blob driver");
> > > > +module_init(az_blob_drv_init); module_exit(az_blob_drv_exit);
> > > > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> > > > index 705e95d..3095611 100644
> > > > --- a/drivers/hv/channel_mgmt.c
> > > > +++ b/drivers/hv/channel_mgmt.c
> > > > @@ -154,6 +154,13 @@
> > > >  	  .allowed_in_isolated = false,
> > > >  	},
> > > >
> > > > +	/* Azure Blob */
> > > > +	{ .dev_type = HV_AZURE_BLOB,
> > > > +	  HV_AZURE_BLOB_GUID,
> > > > +	  .perf_device = false,
> > > > +	  .allowed_in_isolated = false,
> > > > +	},
> > > > +
> > > >  	/* Unknown GUID */
> > > >  	{ .dev_type = HV_UNKNOWN,
> > > >  	  .perf_device = false,
> > > > diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index
> > > > d1e59db..ac31362 100644
> > > > --- a/include/linux/hyperv.h
> > > > +++ b/include/linux/hyperv.h
> > > > @@ -772,6 +772,7 @@ enum vmbus_device_type {
> > > >  	HV_FCOPY,
> > > >  	HV_BACKUP,
> > > >  	HV_DM,
> > > > +	HV_AZURE_BLOB,
> > > >  	HV_UNKNOWN,
> > > >  };
> > > >
> > > > @@ -1350,6 +1351,14 @@ int vmbus_allocate_mmio(struct resource
> > > **new, struct hv_device *device_obj,
> > > >  			  0x72, 0xe2, 0xff, 0xb1, 0xdc, 0x7f)
> > > >
> > > >  /*
> > > > + * Azure Blob GUID
> > > > + * {0590b792-db64-45cc-81db-b8d70c577c9e}
> > > > + */
> > > > +#define HV_AZURE_BLOB_GUID \
> > > > +	.guid = GUID_INIT(0x0590b792, 0xdb64, 0x45cc, 0x81, 0xdb, \
> > > > +			  0xb8, 0xd7, 0x0c, 0x57, 0x7c, 0x9e)
> > > > +
> > > > +/*
> > > >   * Shutdown GUID
> > > >   * {0e0b6031-5213-4934-818b-38d90ced39db}
> > > >   */
> > > > diff --git a/include/uapi/misc/azure_blob.h
> > > > b/include/uapi/misc/azure_blob.h new file mode 100644 index
> > > > 0000000..f4168679
> > > > --- /dev/null
> > > > +++ b/include/uapi/misc/azure_blob.h
> > > > @@ -0,0 +1,34 @@
> > > > +/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0-only WITH
> > > > +Linux-syscall-note */
> > > > +/* Copyright (c) Microsoft Corporation. */
> > > > +
> > > > +#ifndef _AZ_BLOB_H
> > > > +#define _AZ_BLOB_H
> > > > +
> > > > +#include <linux/kernel.h>
> > > > +#include <linux/uuid.h>
> > > > +
> > > > +/* user-mode sync request sent through ioctl */ struct
> > > > +az_blob_request_sync_response {
> > > > +	__u32 status;
> > > > +	__u32 response_len;
> > > > +};
> > > > +
> > > > +struct az_blob_request_sync {
> > > > +	guid_t guid;
> > > > +	__u32 timeout;
> > > > +	__u32 request_len;
> > > > +	__u32 response_len;
> > > > +	__u32 data_len;
> > > > +	__u32 data_valid;
> > > > +	__aligned_u64 request_buffer;
> > >
> > > Is there an implied 32-bit padding field before "request_buffer"?
> > > It seems like "yes", since there are five 32-bit field preceding.
> > > Would it be better to explicitly list that padding field?
> > >
> > > > +	__aligned_u64 response_buffer;
> > > > +	__aligned_u64 data_buffer;
> > > > +	struct az_blob_request_sync_response response; };
> > > > +
> > > > +#define AZ_BLOB_MAGIC_NUMBER	'R'
> > > > +#define IOCTL_AZ_BLOB_DRIVER_USER_REQUEST \
> > > > +		_IOWR(AZ_BLOB_MAGIC_NUMBER, 0xf0, \
> > > > +			struct az_blob_request_sync)
> > > > +
> > > > +#endif /* define _AZ_BLOB_H */
> > > > --
> > > > 1.8.3.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ