[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BY5PR21MB15060A31D031B556700F2767CE119@BY5PR21MB1506.namprd21.prod.outlook.com>
Date: Fri, 16 Jul 2021 19:29:30 +0000
From: Long Li <longli@...rosoft.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Michael Kelley <mikelley@...rosoft.com>
CC: "longli@...uxonhyperv.com" <longli@...uxonhyperv.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
Jonathan Corbet <corbet@....net>,
KY Srinivasan <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Stephen Hemminger <sthemmin@...rosoft.com>,
Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
Bjorn Andersson <bjorn.andersson@...aro.org>,
Hans de Goede <hdegoede@...hat.com>,
"Williams, Dan J" <dan.j.williams@...el.com>,
Maximilian Luz <luzmaximilian@...il.com>,
Mike Rapoport <rppt@...nel.org>,
Ben Widawsky <ben.widawsky@...el.com>,
Jiri Slaby <jirislaby@...nel.org>,
Andra Paraschiv <andraprs@...zon.com>,
Siddharth Gupta <sidgup@...eaurora.org>,
Hannes Reinecke <hare@...e.de>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>
Subject: RE: [Patch v3 2/3] Drivers: hv: add Azure Blob driver
> -----Original Message-----
> From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> Sent: Friday, July 16, 2021 10:28 AM
> To: Michael Kelley <mikelley@...rosoft.com>
> Cc: longli@...uxonhyperv.com; linux-kernel@...r.kernel.org; linux-
> hyperv@...r.kernel.org; Long Li <longli@...rosoft.com>; Jonathan Corbet
> <corbet@....net>; KY Srinivasan <kys@...rosoft.com>; Haiyang Zhang
> <haiyangz@...rosoft.com>; Stephen Hemminger
> <sthemmin@...rosoft.com>; Wei Liu <wei.liu@...nel.org>; Dexuan Cui
> <decui@...rosoft.com>; Bjorn Andersson <bjorn.andersson@...aro.org>;
> Hans de Goede <hdegoede@...hat.com>; Williams, Dan J
> <dan.j.williams@...el.com>; Maximilian Luz <luzmaximilian@...il.com>;
> Mike Rapoport <rppt@...nel.org>; Ben Widawsky
> <ben.widawsky@...el.com>; Jiri Slaby <jirislaby@...nel.org>; Andra
> Paraschiv <andraprs@...zon.com>; Siddharth Gupta
> <sidgup@...eaurora.org>; Hannes Reinecke <hare@...e.de>; linux-
> doc@...r.kernel.org
> Subject: Re: [Patch v3 2/3] Drivers: hv: add Azure Blob driver
>
> On Fri, Jul 16, 2021 at 03:56:14PM +0000, Michael Kelley wrote:
> > > +static int az_blob_remove(struct hv_device *dev) {
> > > + az_blob_dev.removing = true;
> > > + /*
> > > + * RCU lock guarantees that any future calls to az_blob_fop_open()
> > > + * can not use device resources while the inode reference of
> > > + * /dev/azure_blob is being held for that open, and device file is
> > > + * being removed from /dev.
> > > + */
> > > + synchronize_rcu();
> >
> > I don't think this works as you have described. While it will ensure
> > that any in-progress RCU read-side critical sections have completed
> > (i.e., in az_blob_fop_open() ), it does not prevent new read-side
> > critical sections from being started. So as soon as synchronize_rcu()
> > is finished, another thread could find and open the device, and be
> > executing in az_blob_fop_open().
> >
> > But it's not clear to me that this (and the rcu_read_locks in
> > az_blob_fop_open) are really needed anyway. If
> > az_blob_remove_device() is called while one or more threads have it open,
> I think that's OK. Or is there a scenario that I'm missing?
>
> This should not be different from any other tiny character device, why the
> mess with RCU at all?
>
> > > + az_blob_info("removing device\n");
> > > + az_blob_remove_device();
> > > +
> > > + /*
> > > + * At this point, it's not possible to open more files.
> > > + * Wait for all the opened files to be released.
> > > + */
> > > + wait_event(az_blob_dev.file_wait,
> > > +list_empty(&az_blob_dev.file_list));
> >
> > As mentioned in my most recent comments on the previous version of the
> > code, I'm concerned about waiting for all open files to be released in
> > the case of a VMbus rescind. We definitely have to wait for all VSP
> > operations to complete, but that's different from waiting for the
> > files to be closed. The former depends on Hyper-V being well-behaved
> > and will presumably happen quickly in the case of a rescind. But the
> > latter depends entirely on user space code that is out of the kernel's
> > control. The user space process could hang around for hours or days
> > with the file still open, which would block this function from completing,
> and hence block the global work queue.
> >
> > Thinking about this, the core issue may be that having a single static
> > instance of az_blob_device is problematic if we allow the device to be
> > removed (either explicitly with an unbind, or implicitly with a VMbus
> > rescind) and then re-added. Perhaps what needs to happen is that the
> > removed device is allowed to continue to exist as long as user space
> > processes have an open file handle, but they can't perform any
> > operations because the "removing" flag is set (and stays set).
> > If the device is re-added, then a new instance of az_blob_device
> > should be created, and whether or not the old instance is still
> > hanging around is irrelevant.
>
> You should never have a single static copy of the device, that was going to be
> my first review comment once this all actually got to a place that made sense
> to review (which it is not even there yet.) When you do that, then you have
> these crazy race issues you speak of. Use the misc api correctly and you will
> not have any of these problems, why people try to make it harder is beyond
> me...
>
> thanks,
>
> greg k-h
I will address all the comments and send the driver for broader review including
linux-fsdevel and linux-block.
Thanks,
Long
Powered by blists - more mailing lists