[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxjyR3PBgKV2Dfsm8UCALhWpirmEG+yPdooVoqJfEf2XWw@mail.gmail.com>
Date: Thu, 28 Oct 2021 08:11:15 +0300
From: Amir Goldstein <amir73il@...il.com>
To: Vivek Goyal <vgoyal@...hat.com>
Cc: Ioannis Angelakopoulos <iangelak@...hat.com>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
virtio-fs-list <virtio-fs@...hat.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
Jan Kara <jack@...e.cz>, Al Viro <viro@...iv.linux.org.uk>,
Miklos Szeredi <miklos@...redi.hu>,
Steve French <sfrench@...ba.org>
Subject: Re: [RFC PATCH 0/7] Inotify support in FUSE and virtiofs
On Wed, Oct 27, 2021 at 11:24 PM Vivek Goyal <vgoyal@...hat.com> wrote:
>
> On Wed, Oct 27, 2021 at 08:59:15AM +0300, Amir Goldstein wrote:
> > On Tue, Oct 26, 2021 at 10:14 PM Ioannis Angelakopoulos
> > <iangelak@...hat.com> wrote:
> > >
> > >
> > >
> > > On Tue, Oct 26, 2021 at 2:27 PM Vivek Goyal <vgoyal@...hat.com> wrote:
> > >>
> > >> On Tue, Oct 26, 2021 at 08:59:44PM +0300, Amir Goldstein wrote:
> > >> > On Tue, Oct 26, 2021 at 7:18 PM Vivek Goyal <vgoyal@...hat.com> wrote:
> > >> > >
> > >> > > On Tue, Oct 26, 2021 at 06:23:50PM +0300, Amir Goldstein wrote:
> > >> > >
> > >> > > [..]
> > >> > > > > 3) The lifetime of the local watch in the guest kernel is very
> > >> > > > > important. Specifically, there is a possibility that the guest does not
> > >> > > > > receive remote events on time, if it removes its local watch on the
> > >> > > > > target or deletes the inode (and thus the guest kernel removes the watch).
> > >> > > > > In these cases the guest kernel removes the local watch before the
> > >> > > > > remote events arrive from the host (virtiofsd) and as such the guest
> > >> > > > > kernel drops all the remote events for the target inode (since the
> > >> > > > > corresponding local watch does not exist anymore).
> > >> > >
> > >> > > So this is one of the issues which has been haunting us in virtiofs. If
> > >> > > a file is removed, for local events, event is generated first and
> > >> > > then watch is removed. But in case of remote filesystems, it is racy.
> > >> > > It is possible that by the time event arrives, watch is already gone
> > >> > > and application never sees the delete event.
> > >> > >
> > >> > > Not sure how to address this issue.
> > >> >
> > >>
> > >> > Can you take me through the scenario step by step.
> > >> > I am not sure I understand the exact sequence of the race.
> > >>
> > >> Ioannis, please correct me If I get something wrong. You know exact
> > >> details much more than me.
> > >>
> > >> A. Say a guest process unlinks a file.
> > >> B. Fuse sends an unlink request to server (virtiofsd)
> > >> C. File is unlinked on host. Assume there are no other users so inode
> > >> will be freed as well. And event will be generated on host and watch
> > >> removed.
> > >> D. Now Fuse server will send a unlink request reply. unlink notification
> > >> might still be in kernel buffers or still be in virtiofsd or could
> > >> be in virtiofs virtqueue.
> > >> E. Fuse client will receive unlink reply and remove local watch.
> > >>
> > >> Fuse reply and notification event are now traveling in parallel on
> > >> different virtqueues and there is no connection between these two. And
> > >> it could very well happen that fuse reply comes first, gets processed
> > >> first and local watch is removed. And notification is processed right
> > >> after but by then local watch is gone and filesystem will be forced to
> > >> drop event.
> > >>
> > >> As of now situation is more complicated in virtiofsd. We don't keep
> > >> file handle open for file and keep an O_PATH fd open for each file.
> > >> That means in step D above, inode on host is not freed yet and unlink
> > >> event is not generated yet. When unlink reply reaches fuse client,
> > >> it sends FORGET messages to server, and then server closes O_PATH fd
> > >> and then host generates unlink events. By that time its too late,
> > >> guest has already remove local watches (and triggered removal of
> > >> remote watches too).
> > >>
> > >> This second problem probably can be solved by using file handles, but
> > >> basic race will still continue to be there.
> > >>
> > >> > If it is local file removal that causes watch to be removed,
> > >> > then don't drop local events and you are good to go.
> > >> > Is it something else?
> > >>
> > >> - If remote events are enabled, then idea will be that user space gets
> > >> and event when file is actually removed from server, right? Now it
> > >> is possible that another VM has this file open and file has not been
> > >> yet removed. So local event only tells you that file has been removed
> > >> in guest VM (or locally) but does not tell anything about the state
> > >> of file on server. (It has been unlinked on server but inode continues
> > >> to be alive internall).
> > >>
> > >> - If user receives both local and remote delete event, it will be
> > >> confusing. I guess if we want to see both the events, then there
> > >> has to be some sort of info in event which classifies whether event
> > >> is local or remote. And let application act accordingly.
> > >>
> > >> Thanks
> > >> Vivek
> > >>
> > >
> > > Hello Amir!
> > >
> > > Sorry for taking part in the conversation a bit late. Vivek was on point with the
> > > example he gave but the race is a bit more generic than only the DELETE event.
> > >
> > > Let's say that a guest process monitors an inode for OPEN events:
> > >
> > > 1) The same guest process or another guest process opens the file (related to the
> > > monitored inode), and then closes and immediately deletes the file/inode.
> > > 2) The FUSE server (virtiofsd) will mimic the operations of the guest process:
> > > a) Will open the file on the host side and thus a remote OPEN event is going to
> > > be generated on the host and sent to the guest.
> > > b) Will unlink the remote inode and if no other host process uses the inode then the
> > > inode will be freed and a DELETE event is going to be generated on the host and sent
> > > to the guest (However, due to how virtiofsd works and Vivek mentioned, this step won't
> > > happen immediately)
> > >
> >
> > You are confusing DELETE with DELETE_SELF.
> > DELETE corresponds to unlink(), so you get a DELETE event even if
> > inode is a hardlink
> > with nlink > 0 after unlink().
> >
> > The DELETE event is reported (along with filename) against the parent directory
> > inode, so the test case above won't drop the event.
>
> Hi Amir,
>
> Agreed that there is confusion between DELETE and DELETE_SELF events. I
> think Ioannis is referring to DELETE_SELF event. With this example he
> is trying to emphasize that due to races, problem is not limited to
> DELETE_SELF events only and other events could arrive little later
> after the local watch in guest has been removed and then all those
> events will be dropped as well. So he gave example of OPEN event. And
> I think remote IN_IGNORED might face the same fate.
>
> In the case of IN_IGNORED, I am wondering is it ok to generate that
> event locally instead.
>
Yes, that is what I meant.
Local watch removed can be used as a hint to applications that
events might have been lost.
> >
> > > The problem here is that the OPEN event might still be travelling towards the guest in the
> > > virtqueues and arrives after the guest has already deleted its local inode.
> > > While the remote event (OPEN) received by the guest is valid, its fsnotify
> > > subsystem will drop it since the local inode is not there.
> > >
> >
> > I have a feeling that we are mixing issues related to shared server
> > and remote fsnotify.
> > Does virtiofsd support multiple guests already?
>
> I would like to think that there are not many basic issues with shared
> directory configuration. So we don't stop users from using it.
>
> > There are many other
> > issues related
> > to cache coherency that should be dealt with in this case, some of
> > them overlap the
> > problem that you describe, so solving the narrow problem of dropped
> > remote events
> > seems like the wrong way to approach the problem.
>
> Dropped remote event problem/race will exist even if it was not shared
> server. Remote events travel through different virtqueue as comapred
> to fuse request reply. So there is no guarantee in what order events
> or replies will processed.
>
> >
> > I think that in a shared server situation, the simple LOOKUP/FORGET protocol
> > will not suffice.
>
> Hmm.., So what's the problem with LOOKUP/FORGET in shared dir case?
>
I guess when we say "shared dir" case, we mean different things.
I was thinking about "shared between different client" and about the
fact the information about the server refcount of objects is obscured from
the client.
For example, LOOKUP_INC/DEC requests that report back the server's
shared object refcount could prevent a guest inode from being evicted
in case inode has a remote watch.
But you were mentioning "shared dir with guest and host", so LOOKUP
refcount is not relevant in that case.
> > I will not even try to solve the generic problem,
> > but will just
> > mentioned that SMB/NFS protocols use delegations/oplocks in the protocol
> > to advertise object usage by other clients to all clients.
>
> I think Miklos was looking into the idea of some sort of file leases
> on fuse for creating equivalent of delegations. Not sure if that work
> made any progress.
>
> >
> > I think this issue is far outside the scope of your project and you should
> > just leave the dropped events as a known limitation at this point.
>
> May be that's what we should do to begin with and just say these events
> can be lost or never arrive.
>
> > inotify has the event IN_IGNORED that application can use as a hint that some
> > events could have been dropped.
>
> That probably will require generating IN_IGNORED locally when local watch
> goes away (and not rely on remote IN_IGNORED), IIUC.
>
Right.
Thanks,
Amir.
Powered by blists - more mailing lists