[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.1.10.0906221808010.10952@makko.or.mcafeemobile.com>
Date: Mon, 22 Jun 2009 18:17:56 -0700 (PDT)
From: Davide Libenzi <davidel@...ilserver.org>
To: Gregory Haskins <gregory.haskins@...il.com>
cc: Gregory Haskins <ghaskins@...ell.com>,
"Michael S. Tsirkin" <mst@...hat.com>, kvm@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
avi@...hat.com, paulmck@...ux.vnet.ibm.com,
Ingo Molnar <mingo@...e.hu>,
Rusty Russell <rusty@...tcorp.com.au>
Subject: Re: [PATCH 3/3] eventfd: add internal reference counting to fix
notifier race conditions
On Mon, 22 Jun 2009, Gregory Haskins wrote:
> Davide Libenzi wrote:
> > On Mon, 22 Jun 2009, Gregory Haskins wrote:
> >
> >
> >> So up in userspace, the vbus pci-device would have an open reference to
> >> the kvm guest (derived from /dev/kvm) and an open reference to a vbus
> >> (derived from /dev/vbus). Lets call these kvmfd, and vbusfd,
> >> respectively. For something like an interrupt, we would hook the point
> >> where the PCI-MSI interrupt is assigned, and would do the following:
> >>
> >> gsi = kvm_irq_route_gsi();
> >> fd = eventfd(0, 0);
> >> ioctl(kvmfd, KVM_IRQFD_ASSIGN, {fd, gsi});
> >> ioctl(vbusfd, VBUS_SHMSIGNAL_ASSIGN, {sigid, fd});
> >>
> >> So userspace orchestrated the assignment of this one eventfd to a KVM
> >> consumer, and a VBUS producer. The two subsystems do not care about the
> >> details of the other side of the link, per se. VBUS just knows that it
> >> can eventfd_signal() its memory region to tell whomever is listening
> >> that it changed. Likewise, KVM just knows to inject "gsi" when it gets
> >> signalled. You could equally have given "fd" to a userspace thread for
> >> either producer or consumer roles, or any other combination.
> >>
> >> If we were doing PCI-passthough, substitute the last SHMSIGNAL_ASSIGN
> >> ioctl call with some PCI_PASSTHROUGH_ASSIGN verb and you get the idea.
> >>
> >> The important thing is that once this is established, userspace doesn't
> >> necessarily care about the fd anymore. So now the question is: do we
> >> keep it around for other things? Do we keep it around because we don't
> >> want KVM to see the POLLHUP, or do we address the "release" code so that
> >> it works even if userspace issued close(fd) at this point. I am not
> >> sure what the answer is, but this is the scenario we are concerned with
> >> in this thread. In the example above, vbus is free to produce events on
> >> its eventfd until it gets a SHMSIGNAL_DEASSIGN request.
> >>
> >
> > I see.
> > The thing remains, that in order to reliably handle generic
> > producer/consumer scenarios you'd need a reference counting similar to
> > pipes, where the notion of producer and consumer is very well defined.
> >
>
> I see your point.
>
> Well, I think the more important thing here is that we address the
> races, and add support for DEASSIGN. We can do both of those things
> with any of the patches that you and I have been kicking around. So
> what I propose is that we move forward with whatever patch you bless as
> proper for now. This producer-release issue is pretty minor in the
> grand scheme of things. We can always just have userspace hold the fd.
Is it a real problem? Can it be decently handled on the KVM side?
Reason I'm asking, is that I wouldn't want to change the interface, only
to find it unsuitable a few weeks later.
> I can either take in the last one you sent, or it sounds like you wanted
> to possibly do another round of cleanup? Whatever the case may be, let
> me know and we can coordinate with Andrew/Avi on what tree to pull it
> into. It sounds like riding in kvm.git is the perhaps the most logical.
Yes, I'd like to drop the pollcb bits (that you can implement KVM-side),
at least.
And yes, going kvm.git is probably the best path.
- Davide
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists