linux-kernel - Re: [PATCH v3 2/2] kvm: KVM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1341866113.2428.82.camel@bling.home>
Date:	Mon, 09 Jul 2012 14:35:13 -0600
From:	Alex Williamson <alex.williamson@...hat.com>
To:	"Michael S. Tsirkin" <mst@...hat.com>
Cc:	avi@...hat.com, gleb@...hat.com, kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org, jan.kiszka@...mens.com
Subject: Re: [PATCH v3 2/2] kvm: KVM_EOIFD, an eventfd for EOIs

On Thu, 2012-07-05 at 18:53 +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 04, 2012 at 10:24:54PM -0600, Alex Williamson wrote:
> > On Wed, 2012-07-04 at 17:00 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Jul 03, 2012 at 01:21:29PM -0600, Alex Williamson wrote:
> > > > This new ioctl enables an eventfd to be triggered when an EOI is
> > > > written for a specified irqchip pin.  By default this is a simple
> > > > notification, but we can also tie the eoifd to a level irqfd, which
> > > > enables the irqchip pin to be automatically de-asserted on EOI.
> > > > This mode is particularly useful for device-assignment applications
> > > > where the unmask and notify triggers a hardware unmask.  The default
> > > > mode is most applicable to simple notify with no side-effects for
> > > > userspace usage, such as Qemu.
> > > > 
> > > > Here we make use of the reference counting of the _irq_source
> > > > object allowing us to share it with an irqfd and cleanup regardless
> > > > of the release order.
> > > > 
> > > > Signed-off-by: Alex Williamson <alex.williamson@...hat.com>
> > > > ---
> > > > 
> > > >  Documentation/virtual/kvm/api.txt |   21 ++++
> > > >  arch/x86/kvm/x86.c                |    1 
> > > >  include/linux/kvm.h               |   14 ++
> > > >  include/linux/kvm_host.h          |   13 ++
> > > >  virt/kvm/eventfd.c                |  208 +++++++++++++++++++++++++++++++++++++
> > > >  virt/kvm/kvm_main.c               |   11 ++
> > > >  6 files changed, 266 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > > > index c7267d5..a38af14 100644
> > > > --- a/Documentation/virtual/kvm/api.txt
> > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > @@ -1988,6 +1988,27 @@ to independently assert level interrupts.  The KVM_IRQFD_FLAG_LEVEL
> > > >  is only necessary on setup, teardown is identical to that above.
> > > >  KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > > >  
> > > > +4.77 KVM_EOIFD
> > > > +
> > > > +Capability: KVM_CAP_EOIFD
> > > 
> > > Maybe add a specific capability KVM_CAP_EOIFD_LEVEL_IRQFD too?
> > 
> > Good idea, allows them to be split later.
> > 
> > > > +Architectures: x86
> > > > +Type: vm ioctl
> > > > +Parameters: struct kvm_eoifd (in)
> > > > +Returns: 0 on success, -1 on error
> > > > +
> > > > +KVM_EOIFD allows userspace to receive interrupt EOI notification
> > > > +through an eventfd.  kvm_eoifd.fd specifies the eventfd used for
> > > > +notification and kvm_eoifd.gsi specifies the irchip pin, similar to
> > > > +KVM_IRQFD.  The eoifd is removed using the KVM_EOIFD_FLAG_DEASSIGN
> > > > +flag, specifying both kvm_eoifd.fd and kvm_eoifd.gsi.
> > > 
> > > This text reads like it would give you EOIs for any GSI,
> > > but you haven't actually implemented this for edge GSIs - and
> > > making it work would bloat the datapath for fast (edge) interrupts.
> > 
> > I do allow you to register any GSI, but you won't be getting EOIs unless
> > it's operating in level triggered mode.  Perhaps it's best to specify it
> > as unsupported and let some future patch create a new capability if
> > support is added.  I'll add a comment.
> > 
> > > What's the intended use of this part of the interface? qemu with
> > > irqchip disabled?
> > 
> > VFIO should not be dependent on KVM, therefore when kvm is not enabled
> > we need to add an interface in qemu for devices to be notified of eoi.
> >
> > This doesn't currently exist.  VFIO can take additional advantage of
> > irqchip when it is enabled, thus the interface below.  However, I don't
> > feel I can propose an eoi notifier in qemu that stops working as soon as
> > irqchip is enabled, even if I'm the only user.  This theoretical qemu
> > eoi notifier could then use the above when irqchip is enabled.
> 
> Well internal qemu APIs are qemu's problem and can be addressed there.
> For example, can we make it mimic our interface: make qemu EOI notifier
> accept an object that includes qemu_irq without irqchip and irqfd with?

So the qemu API changes based on whether irqchip is enabled or not?!

> In other words adding interface with no users looks weird.

If we could ever finish this patch, I could start working on the
user... ;)

> > > > +
> > > > +The KVM_EOIFD_FLAG_LEVEL_IRQFD flag indicates that the provided
> > > > +kvm_eoifd stucture includes a valid kvm_eoifd.irqfd file descriptor
> > > > +for a level irqfd configured using the KVM_IRQFD_FLAG_LEVEL flag.
> > > > +In this mode the level interrupt is de-asserted prior to EOI eventfd
> > > > +notification.  The KVM_EOIFD_FLAG_LEVEL_IRQFD is only necessary on
> > > > +setup, teardown is identical to that above.
> > > 
> > > It seems a single fd can not be used multiple times for
> > > different irqfds? In that case:
> > > 1. Need to document this limitation
> > 
> > Ok
> > 
> > > 2. This differs from other notifiers e.g.  ioeventfd.
> > 
> > But the same as an irqfd.
> 
> However irqfd is about interrupting guest. eoifd is more like ioeventfd
> really: kvm writes ioeventfd/eoifd but reads irqfd.
> 
> >  Neither use case I'm thinking of needs to
> > allow eventfds to be re-used and knowing that an eventfd is unique on
> > our list makes matching it much easier.  For instance, if we have an
> > eoifd bound to a level irqfd and it's being de-assigned, we'd have to
> > require the eoifd is de-assigned before the irqfd so that we can
> > validate matching eventfd, gsi, and irqfd.  That gets very racy when
> > userspace aborts.
> 
> Why can't eoifd simply keep a reference to irqfd? Or register to
> the hangup event.

An irqfd is just a file descriptor and we can't block close().  I was
trying to use struct _irq_source as the reference counted object because
I didn't see any need to keep the irqfd around.

> > > Pls note: if you decide to remove this limitation, we need
> > > to add some other limit on the # of EOIFDs that VM can have
> > > as won't be limited by # of fds anymore.
> > 
> > I don't plan on changing this unless someone has an argument why it
> > would be useful.
> 
> For example you can use same eventfd as ioeventfd
> and eoifd.  Why not two eoifds?
> 
> I need to think about specifics more but it just breaks how eventfd is
> normally used. It should be an agrregator that can count any writers and
> support 1 reader.
> 
> Let me draw this:
> 
> W1 ---->\
> W2 ----> +--> eventfd ----> R
> W3 ---->/
> 
> 
> This explains the difference: in irqfd kvm reads eventfd so only 1 is
> supported. in ioeventfd (and eoifd) kvm is the writer so any number
> should be supported.

Yes, removal is the problem.  If I allow the same eventfd to fire for
multiple EOIs, then I need to match eventfd and irqfd on de-assign.
That means I need to cache some bit of context about the irqfd in case
it's closed before the eoifd.  Perhaps the _eoifd holds a reference to
_irqfd.eventfd for that.

> > > > +
> > > >  5. The kvm_run structure
> > > >  ------------------------
> > > >  
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index 80bed07..62d6eca 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -2149,6 +2149,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > > >  	case KVM_CAP_PCI_2_3:
> > > >  	case KVM_CAP_KVMCLOCK_CTRL:
> > > >  	case KVM_CAP_IRQFD_LEVEL:
> > > > +	case KVM_CAP_EOIFD:
> > > >  		r = 1;
> > > >  		break;
> > > >  	case KVM_CAP_COALESCED_MMIO:
> > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > > index b2e6e4f..7567e7d 100644
> > > > --- a/include/linux/kvm.h
> > > > +++ b/include/linux/kvm.h
> > > > @@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info {
> > > >  #define KVM_CAP_S390_COW 79
> > > >  #define KVM_CAP_PPC_ALLOC_HTAB 80
> > > >  #define KVM_CAP_IRQFD_LEVEL 81
> > > > +#define KVM_CAP_EOIFD 82
> > > >  
> > > >  #ifdef KVM_CAP_IRQ_ROUTING
> > > >  
> > > > @@ -694,6 +695,17 @@ struct kvm_irqfd {
> > > >  	__u8  pad[20];
> > > >  };
> > > >  
> > > > +#define KVM_EOIFD_FLAG_DEASSIGN (1 << 0)
> > > > +#define KVM_EOIFD_FLAG_LEVEL_IRQFD (1 << 1)
> > > > +
> > > > +struct kvm_eoifd {
> > > > +	__u32 fd;
> > > > +	__u32 gsi;
> > > > +	__u32 flags;
> > > > +	__u32 irqfd;
> > > > +	__u8 pad[16];
> > > > +};
> > > > +
> > > >  struct kvm_clock_data {
> > > >  	__u64 clock;
> > > >  	__u32 flags;
> > > > @@ -834,6 +846,8 @@ struct kvm_s390_ucas_mapping {
> > > >  #define KVM_PPC_GET_SMMU_INFO	  _IOR(KVMIO,  0xa6, struct kvm_ppc_smmu_info)
> > > >  /* Available with KVM_CAP_PPC_ALLOC_HTAB */
> > > >  #define KVM_PPC_ALLOCATE_HTAB	  _IOWR(KVMIO, 0xa7, __u32)
> > > > +/* Available with KVM_CAP_EOIFD */
> > > > +#define KVM_EOIFD                 _IOW(KVMIO,  0xa8, struct kvm_eoifd)
> > > >  
> > > >  /*
> > > >   * ioctls for vcpu fds
> > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > index ae3b426..83472eb 100644
> > > > --- a/include/linux/kvm_host.h
> > > > +++ b/include/linux/kvm_host.h
> > > > @@ -285,6 +285,10 @@ struct kvm {
> > > >  		struct list_head  items;
> > > >  	} irqfds;
> > > >  	struct list_head ioeventfds;
> > > > +	struct {
> > > > +		spinlock_t        lock;
> > > 
> > > Can't this be a mutex? It seems that code will be much simpler
> > > if you can just take the mutex, do everything and unlock.
> > 
> > Yeah, looks like it could.
> > 
> > > For example you could prevent changing eoifds while irqfds
> > > are changed.
> > 
> > I'm not sure we need to be that strict, but we could be sure to get a
> > reference to the _source_id using the irqfd lock that way... actually we
> > could do that in the current locking scheme too.
> 
> Just a suggestion.
> 
> > > > +		struct list_head  items;
> > > > +	} eoifds;
> > > >  #endif
> > > >  	struct kvm_vm_stat stat;
> > > >  	struct kvm_arch arch;
> > > > @@ -828,6 +832,8 @@ int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args);
> > > >  void kvm_irqfd_release(struct kvm *kvm);
> > > >  void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
> > > >  int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
> > > > +int kvm_eoifd(struct kvm *kvm, struct kvm_eoifd *args);
> > > > +void kvm_eoifd_release(struct kvm *kvm);
> > > >  
> > > >  #else
> > > >  
> > > > @@ -853,6 +859,13 @@ static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
> > > >  	return -ENOSYS;
> > > >  }
> > > >  
> > > > +static inline int kvm_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
> > > > +{
> > > > +	return -ENOSYS;
> > > > +}
> > > > +
> > > > +static inline void kvm_eoifd_release(struct kvm *kvm) {}
> > > > +
> > > >  #endif /* CONFIG_HAVE_KVM_EVENTFD */
> > > >  
> > > >  #ifdef CONFIG_KVM_APIC_ARCHITECTURE
> > > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > > > index 92aa5ba..2bc9768 100644
> > > > --- a/virt/kvm/eventfd.c
> > > > +++ b/virt/kvm/eventfd.c
> > > > @@ -62,8 +62,7 @@ static void _irq_source_put(struct _irq_source *source)
> > > >  		kref_put(&source->kref, _irq_source_release);
> > > >  }
> > > >  
> > > > -static struct _irq_source *__attribute__ ((used)) /* white lie for now */
> > > > -_irq_source_get(struct _irq_source *source)
> > > > +static struct _irq_source *_irq_source_get(struct _irq_source *source)
> > > >  {
> > > >  	if (source)
> > > >  		kref_get(&source->kref);
> > > > @@ -119,6 +118,39 @@ struct _irqfd {
> > > >  	struct work_struct shutdown;
> > > >  };
> > > >  
> > > > +static struct _irqfd *_irqfd_fdget(struct kvm *kvm, int fd)
> > > > +{
> > > > +	struct eventfd_ctx *eventfd;
> > > > +	struct _irqfd *tmp, *irqfd = NULL;
> > > > +
> > > > +	eventfd = eventfd_ctx_fdget(fd);
> > > > +	if (IS_ERR(eventfd))
> > > > +		return (struct _irqfd *)eventfd;
> > > > +
> > > > +	spin_lock_irq(&kvm->irqfds.lock);
> > > > +
> > > > +	list_for_each_entry(tmp, &kvm->irqfds.items, list) {
> > > > +		if (tmp->eventfd == eventfd) {
> > > > +			irqfd = tmp;
> > > > +			break;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	spin_unlock_irq(&kvm->irqfds.lock);
> > > > +
> > > > +	if (!irqfd) {
> > > > +		eventfd_ctx_put(eventfd);
> > > > +		return ERR_PTR(-ENODEV);
> > > > +	}
> > > 
> > > Can irqfd get dassigned at this point?
> > > Could one way to fix be to take eoifd.lock in kvm_irqfd?
> > 
> > Yeah, I'm concerned this isn't a sufficient reference to the _irqfd.
> > I'll play with locking; I think we actually want the reverse, holding
> > irqfd.lock for the search through getting a reference to the
> > _irq_source.
> > 
> > > > +
> > > > +	return irqfd;
> > > > +}
> > > > +
> > > > +static void _irqfd_put(struct _irqfd *irqfd)
> > > > +{
> > > > +	eventfd_ctx_put(irqfd->eventfd);
> > > > +}
> > > > +
> > > >  static struct workqueue_struct *irqfd_cleanup_wq;
> > > >  
> > > >  static void
> > > > @@ -387,6 +419,8 @@ kvm_eventfd_init(struct kvm *kvm)
> > > >  	spin_lock_init(&kvm->irqfds.lock);
> > > >  	INIT_LIST_HEAD(&kvm->irqfds.items);
> > > >  	INIT_LIST_HEAD(&kvm->ioeventfds);
> > > > +	spin_lock_init(&kvm->eoifds.lock);
> > > > +	INIT_LIST_HEAD(&kvm->eoifds.items);
> > > >  }
> > > >  
> > > >  /*
> > > > @@ -753,3 +787,173 @@ kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
> > > >  
> > > >  	return kvm_assign_ioeventfd(kvm, args);
> > > >  }
> > > > +
> > > > +/*
> > > > + * --------------------------------------------------------------------
> > > > + *  eoifd: Translate KVM APIC/IOAPIC EOI into eventfd signal.
> > > > + *
> > > > + *  userspace can register GSIs with an eventfd for receiving
> > > > + *  notification when an EOI occurs.
> > > > + * --------------------------------------------------------------------
> > > > + */
> > > > +
> > > > +struct _eoifd {
> > > > +	struct eventfd_ctx *eventfd;
> > > > +	struct _irq_source *source; /* for de-asserting level irqfd */
> > > > +	struct kvm *kvm;
> > > > +	struct kvm_irq_ack_notifier notifier;
> > > > +	struct list_head list;
> > > > +};
> > > > +
> > > > +static void eoifd_event(struct kvm_irq_ack_notifier *notifier)
> > > > +{
> > > > +	struct _eoifd *eoifd;
> > > > +
> > > > +	eoifd = container_of(notifier, struct _eoifd, notifier);
> > > > +
> > > > +	/*
> > > > +	 * If the eoifd is tied to a level irqfd we de-assert it here.
> > > > +	 * The user is responsible for re-asserting it if their device
> > > > +	 * still needs attention.  For notification-only, skip this.
> > > > +	 */
> > > > +	if (eoifd->source)
> > > > +		kvm_set_irq(eoifd->kvm, eoifd->source->id,
> > > > +			    eoifd->notifier.gsi, 0);
> > > > +
> > > 
> > > 
> > > This will fire even if the source in question did not assert an
> > > interrupt in the first place.  This won't scale well if
> > > the interrupt is shared. Don't we need to filter by source state?
> > 
> > Extra spurious EOIs are better than dropped EOIs.  However, it does seem
> > we could add an atomic_t to _irq_source tracking the assertion state of
> > our source id.  I'll experiment with that.
> 
> I don't see why do we need more state. kvm_irq_line_state already
> uses atomics for bits in irq_state.

kvm_irq_line_state:
(a) static inline in irq_comm.c
(b) changes bits
(c) requires irq_state from ???


> > > > +	eventfd_signal(eoifd->eventfd, 1);
> > > > +}
> > > > +
> > > > +static int kvm_assign_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
> > > > +{
> > > > +	struct eventfd_ctx *eventfd;
> > > > +	struct _eoifd *eoifd, *tmp;
> > > > +	struct _irq_source *source = NULL;
> > > > +
> > > > +	if (args->flags & KVM_EOIFD_FLAG_LEVEL_IRQFD) {
> > > > +		struct _irqfd *irqfd = _irqfd_fdget(kvm, args->irqfd);
> > > > +		if (IS_ERR(irqfd))
> > > > +			return PTR_ERR(irqfd);
> > > > +
> > > > +		if (irqfd->gsi != args->gsi) {
> > > > +			_irqfd_put(irqfd);
> > > > +			return -EINVAL;
> > > > +		}
> > > > +
> > > 
> > > This requirement does not seem to be documented.
> > > If we require this, do we need to pass in gsi at all?
> > 
> > It seemed obvious to me that when using KVM_EOIFD_FLAG_LEVEL_IRQFD the
> > gsi needs to match that used for the kvm_irqfd.  Suppose I should never
> > assume anything is obvious in an API spec.  I'll add a comment.  I think
> > it would make the user interface more confusing to specify that gsi is
> > not required when using a level irqfd,
> 
> Maybe just make it a union: one or the other?
> 
> 
> > besides it gives us an easy sanity check.
> 
> Well pass less parameters and there will be less bugs and less stuff
> to check :)
> 
> > > > +		source = _irq_source_get(irqfd->source);
> > > > +		_irqfd_put(irqfd);
> > > > +		if (!source)
> > > > +			return -EINVAL;
> > > > +	}
> > > > +
> > > > +	eventfd = eventfd_ctx_fdget(args->fd);
> > > > +	if (IS_ERR(eventfd)) {
> > > > +		_irq_source_put(source);
> > > > +		return PTR_ERR(eventfd);
> > > > +	}
> > > > +
> > > > +	eoifd = kzalloc(sizeof(*eoifd), GFP_KERNEL);
> > > > +	if (!eoifd) {
> > > > +		_irq_source_put(source);
> > > > +		eventfd_ctx_put(eventfd);
> > > > +		return -ENOMEM;
> > > > +	}
> > > > +
> > > > +	INIT_LIST_HEAD(&eoifd->list);
> > > > +	eoifd->kvm = kvm;
> > > > +	eoifd->eventfd = eventfd;
> > > > +	eoifd->source = source;
> > > > +	eoifd->notifier.gsi = args->gsi;
> > > > +	eoifd->notifier.irq_acked = eoifd_event;
> > > > +
> > > > +	spin_lock_irq(&kvm->eoifds.lock);
> > > > +
> > > > +	list_for_each_entry(tmp, &kvm->eoifds.items, list) {
> > > > +		if (eoifd->eventfd != tmp->eventfd)
> > > > +			continue;
> > > > +
> > > > +		spin_unlock_irq(&kvm->eoifds.lock);
> > > > +		_irq_source_put(source);
> > > > +		eventfd_ctx_put(eventfd);
> > > > +		kfree(eoifd);
> > > > +		return -EBUSY;
> > > 
> > > 
> > > Please rewrite this function using labels for error handling.
> > 
> > Ok
> > 
> > > > +	}
> > > > +
> > > > +	list_add_tail(&eoifd->list, &kvm->eoifds.items);
> > > > +
> > > > +	spin_unlock_irq(&kvm->eoifds.lock);
> > > > +
> > > > +	kvm_register_irq_ack_notifier(kvm, &eoifd->notifier);
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static void eoifd_deactivate(struct kvm *kvm, struct _eoifd *eoifd)
> > > 
> > > Pls find a less confusing name for this.
> > > It just frees the eoifd. No tricky deactivation here (unlike irqfd).
> > 
> > Uh, confusing?  It does actually deactivate it by unregistering the irq
> > ack notifier.
> 
> It also destroys it completely. To me deactivate implies "make inactive
> but do not destroy".

Umm, but as you note irqfd_deactivate does result in destroying it.

> >  I was attempting to be consistent with the function
> > naming by following irqfd_deactivate, I didn't realize there was a
> > complexity requirement associated with the name.
> 
> irqfd_deactivate is named like this because it does not free the object:
> it merely deactivates it and queues up the free job.
> And that in turn is because of tricky locking issues specific to irqfd
> that have to do with the fact kvm reads it. eoifd is written
> from kvm so no such problem.

I fail to see the logic that deactivating and *queuing* a free job is
significantly different from deactivating and actually freeing.  In
either case, the object is gone at the end as far as the user of the
foo_deactivate function is concerned.  Maybe the irqfd function is
poorly named too.

> > > > +{
> > > > +	kvm_unregister_irq_ack_notifier(kvm, &eoifd->notifier);
> > > > +	_irq_source_put(eoifd->source);
> > > > +	eventfd_ctx_put(eoifd->eventfd);
> > > > +	kfree(eoifd);
> > > > +}
> > > > +
> > > > +void kvm_eoifd_release(struct kvm *kvm)
> > > > +{
> > > > +	spin_lock_irq(&kvm->eoifds.lock);
> > > > +
> > > > +	while (!list_empty(&kvm->eoifds.items)) {
> > > > +		struct _eoifd *eoifd;
> > > > +
> > > > +		eoifd = list_first_entry(&kvm->eoifds.items,
> > > > +					 struct _eoifd, list);
> > > > +		list_del(&eoifd->list);
> > > > +
> > > > +		/* Drop spinlocks since eoifd_deactivate can sleep */
> > > > +		spin_unlock_irq(&kvm->eoifds.lock);
> > > > +		eoifd_deactivate(kvm, eoifd);
> > > > +		spin_lock_irq(&kvm->eoifds.lock);
> > > > +	}
> > > > +
> > > > +	spin_unlock_irq(&kvm->eoifds.lock);
> > > > +}
> > > > +
> > > > +static int kvm_deassign_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
> > > > +{
> > > > +	struct eventfd_ctx *eventfd;
> > > > +	struct _eoifd *tmp, *eoifd = NULL;
> > > > +
> > > > +	eventfd = eventfd_ctx_fdget(args->fd);
> > > > +	if (IS_ERR(eventfd))
> > > > +		return PTR_ERR(eventfd);
> > > > +
> > > > +	spin_lock_irq(&kvm->eoifds.lock);
> > > > +
> > > > +	list_for_each_entry(tmp, &kvm->eoifds.items, list) {
> > > > +		if (tmp->eventfd == eventfd && tmp->notifier.gsi == args->gsi) {
> > > 
> > > This is repeated several times.
> > > looks like eoifd_collision function is called for.
> > 
> > The tests aren't quite identical.  Here we test eventfd and gsi, the
> > latter primarily to validate kvm_eoifd parameters.  Further above we
> > test that eventfd is unique on addition without comparing gsi.  Both of
> > these match the existing precedent set by irqfd.  With _irqfd_fdget we
> > do have two searches of the irqfds.items list comparing only eventfd...
> > I'll see about making a trivial helper there.  Thanks,
> > 
> > Alex
> 
> I hope I clarified that irqfd is not a good match. Pls make it
> behave like ioeventfd instead.
> 
> > > > +			eoifd = tmp;
> > > > +			list_del(&eoifd->list);
> > > > +			break;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	spin_unlock_irq(&kvm->eoifds.lock);
> > > > +
> > > > +	eventfd_ctx_put(eventfd);
> > > > +
> > > > +	if (!eoifd)
> > > > +		return -ENOENT;
> > > > +
> > > > +	eoifd_deactivate(kvm, eoifd);
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +int kvm_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
> > > > +{
> > > > +	if (args->flags & ~(KVM_EOIFD_FLAG_DEASSIGN |
> > > > +			    KVM_EOIFD_FLAG_LEVEL_IRQFD))
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (args->flags & KVM_EOIFD_FLAG_DEASSIGN)
> > > > +		return kvm_deassign_eoifd(kvm, args);
> > > > +
> > > > +	return kvm_assign_eoifd(kvm, args);
> > > > +}
> > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > > index b4ad14cc..5b41df1 100644
> > > > --- a/virt/kvm/kvm_main.c
> > > > +++ b/virt/kvm/kvm_main.c
> > > > @@ -620,6 +620,8 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
> > > >  
> > > >  	kvm_irqfd_release(kvm);
> > > >  
> > > > +	kvm_eoifd_release(kvm);
> > > > +
> > > >  	kvm_put_kvm(kvm);
> > > >  	return 0;
> > > >  }
> > > > @@ -2093,6 +2095,15 @@ static long kvm_vm_ioctl(struct file *filp,
> > > >  		break;
> > > >  	}
> > > >  #endif
> > > > +	case KVM_EOIFD: {
> > > > +		struct kvm_eoifd data;
> > > > +
> > > > +		r = -EFAULT;
> > > > +		if (copy_from_user(&data, argp, sizeof data))
> > > > +			goto out;
> > > > +		r = kvm_eoifd(kvm, &data);
> > > > +		break;
> > > > +	}
> > > >  	default:
> > > >  		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
> > > >  		if (r == -ENOTTY)
> > 
> > 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/