lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <E959C4978C3B6342920538CF579893F0025D800A@SHSMSX104.ccr.corp.intel.com>
Date:	Mon, 29 Jun 2015 13:27:50 +0000
From:	"Wu, Feng" <feng.wu@...el.com>
To:	Alex Williamson <alex.williamson@...hat.com>
CC:	Eric Auger <eric.auger@...aro.org>,
	Avi Kivity <avi.kivity@...il.com>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>,
	"mtosatti@...hat.com" <mtosatti@...hat.com>,
	Joerg Roedel <joro@...tes.org>, "Wu, Feng" <feng.wu@...el.com>
Subject: RE: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@...hat.com]
> Sent: Friday, June 19, 2015 4:04 AM
> To: Wu, Feng
> Cc: Eric Auger; Avi Kivity; kvm@...r.kernel.org; linux-kernel@...r.kernel.org;
> pbonzini@...hat.com; mtosatti@...hat.com; Joerg Roedel
> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> 
> [Adding Joerg since he was part of this original idea]
> 
> On Thu, 2015-06-18 at 09:16 +0000, Wu, Feng wrote:
> >
> >
> > > -----Original Message-----
> > > From: Alex Williamson [mailto:alex.williamson@...hat.com]
> > > Sent: Tuesday, June 16, 2015 12:45 AM
> > > To: Eric Auger
> > > Cc: Avi Kivity; Wu, Feng; kvm@...r.kernel.org;
> linux-kernel@...r.kernel.org;
> > > pbonzini@...hat.com; mtosatti@...hat.com
> > > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> > >
> > > On Mon, 2015-06-15 at 18:17 +0200, Eric Auger wrote:
> > > > Hi Alex, all,
> > > > On 06/12/2015 09:03 PM, Alex Williamson wrote:
> > > > > On Fri, 2015-06-12 at 21:48 +0300, Avi Kivity wrote:
> > > > >> On 06/12/2015 06:41 PM, Alex Williamson wrote:
> > > > >>> On Fri, 2015-06-12 at 00:23 +0000, Wu, Feng wrote:
> > > > >>>>> -----Original Message-----
> > > > >>>>> From: Avi Kivity [mailto:avi.kivity@...il.com]
> > > > >>>>> Sent: Friday, June 12, 2015 3:59 AM
> > > > >>>>> To: Wu, Feng; kvm@...r.kernel.org; linux-kernel@...r.kernel.org
> > > > >>>>> Cc: pbonzini@...hat.com; mtosatti@...hat.com;
> > > > >>>>> alex.williamson@...hat.com; eric.auger@...aro.org
> > > > >>>>> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> > > > >>>>>
> > > > >>>>> On 06/11/2015 01:51 PM, Feng Wu wrote:
> > > > >>>>>> From: Eric Auger <eric.auger@...aro.org>
> > > > >>>>>>
> > > > >>>>>> This patch adds and documents a new KVM_DEV_VFIO_DEVICE
> > > group
> > > > >>>>>> and 2 device attributes:
> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> > > > >>>>>> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be
> > > able
> > > > >>>>>> to set a VFIO device IRQ as forwarded or not forwarded.
> > > > >>>>>> the command takes as argument a handle to a new struct named
> > > > >>>>>> kvm_vfio_dev_irq.
> > > > >>>>> Is there no way to do this automatically?  After all, vfio knows that
> a
> > > > >>>>> device interrupt is forwarded to some eventfd, and kvm knows that
> > > some
> > > > >>>>> eventfd is forwarded to a guest interrupt.  If they compare notes
> > > > >>>>> through a central registry, they can figure out that the interrupt
> needs
> > > > >>>>> to be forwarded.
> > > > >>>> Oh, just like Eric mentioned in his reply, this description is out of
> context
> > > of
> > > > >>>> this series, I will remove them in the next version.
> > > > >>>
> > > > >>> I suspect Avi's question was more general.  While forward/unforward
> is
> > > > >>> out of context for this series, it's very similar in nature to
> > > > >>> enabling/disabling posted interrupts.  So I think the question remains
> > > > >>> whether we really need userspace to participate in creating this
> > > > >>> shortcut or if kvm and vfio can some how orchestrate figuring it out
> > > > >>> automatically.
> > > > >>>
> > > > >>> Personally I don't know how we could do it automatically.  We've
> always
> > > > >>> relied on userspace to independently setup vfio and kvm such that
> > > > >>> neither have any idea that the other is there and update each side
> > > > >>> independently when anything changes.  So it seems consistent to
> > > continue
> > > > >>> that here.  It doesn't seem like there's much to gain
> performance-wise
> > > > >>> either, updates should be a relatively rare event I'd expect.
> > > > >>>
> > > > >>> There's really no metadata associated with an eventfd, so "comparing
> > > > >>> notes" automatically might imply some central registration entity.
> That
> > > > >>> immediately sounds like a much more complex solution, but maybe Avi
> > > has
> > > > >>> some ideas to manage it.  Thanks,
> > > > >>>
> > > > >>
> > > > >> The idea is to have a central registry maintained by a posted interrupts
> > > > >> manager.  Both vfio and kvm pass the filp (along with extra
> information)
> > > > >> to the posted interrupts manager, which, when it detects a filp match,
> > > > >> tells each of them what to do.
> > > > >>
> > > > >> The advantages are:
> > > > >> - old userspace gains the optimization without change
> > > > >> - a userspace API is more expensive to maintain than internal kernel
> > > > >> interfaces (CVEs, documentation, maintaining backwards compatibility)
> > > > >> - if you can do it without a new interface, this indicates that all the
> > > > >> information in the new interface is redundant.  That means you have
> to
> > > > >> check it for consistency with the existing information, so it's extra
> > > > >> work (likely, it's exactly what the posted interrupt manager would be
> > > > >> doing anyway).
> > > > >
> > > > > Yep, those all sound like good things and I believe that's similar in
> > > > > design to the way we had originally discussed this interaction at
> > > > > LPC/KVM Forum several years ago.  I'd be in favor of that approach.
> > > >
> > > > I guess this discussion also is relevant wrt "[RFC v6 00/16] KVM-VFIO
> > > > IRQ forward control" series? Or is that "central registry maintained by
> > > > a posted interrupts manager" something more specific to x86?
> > >
> > > I'd think we'd want it for any sort of offload and supporting both
> > > posted-interrupts and irq-forwarding would be a good validation.  I
> > > imagine there would be registration/de-registration callbacks separate
> > > for interrupt producers vs interrupt consumers.  Each registration
> > > function would likely provide a struct of callbacks, probably similar to
> > > the get_symbol callbacks proposed for the kvm-vfio device on the IRQ
> > > producer side.  The eventfd would be the token that the manager would
> > > use to match producers and consumers.  The hard part is probably
> > > figuring out what information to retrieve from the producer and provide
> > > to the consumer in a generic way between pci and platform, but as an
> > > internal interface, it's not a big deal if we screw it up a few times to
> > > start.  Thanks,
> >
> > On posted-interrupts side, the main purpose of the new APIs is to update
> > the IRTE when guest changes vMSI/vMSIx configuration. Alex, do you have
> > any detailed ideas for the new solution to achieve this purpose? It should
> > be helpful if you can share some!
> 
> 
> There are plenty of details to be filled in, but I think the basics
> looks something like the code below.  The IRQ bypass manager just
> defines a pair of structures, one for interrupt producers and one for
> interrupt consumers.  I'm certain that we'll need more callbacks than
> I've defined below, but figuring out what those should be for the best
> abstraction is the hardest part of this idea.  The manager provides both
> registration and de-registration interfaces for both types of objects
> and keeps lists for each, protected by a lock.  The manager doesn't even
> really need to know what the match token is, but I assume for our
> purposes it will be an eventfd_ctx.
> 
> On the vfio side, the producer struct would be embedded in the
> vfio_pci_irq_ctx struct.  KVM would probably embed the consumer struct
> in _irqfd.  As I've coded below, the IRQ bypass manager calls the
> consumer callbacks, so the producer struct would need fields or
> callbacks to provide the consumer the info it needs.  AIUI the Posted
> Interrupt model, VFIO only needs to provide data to the consumer.  For
> IRQ Forwarding, I think the producer needs to be informed when bypass is
> active to model the incoming interrupt as edge vs level.
> 
> I've prototyped the base IRQ bypass manager here as static, but I don't
> see any reason it couldn't be a module that's loaded by dependency when
> either vfio-pci or kvm-intel is loaded (or other producer/consumer
> objects).
> 
> Is this a reasonable starting point to craft the additional fields and
> callbacks and interaction of who calls who that we need to support
> Posted Interrupts and IRQ Forwarding?  Is the AMD version of this still
> alive?  Thanks,
> 
> Alex
> 
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 413a7bf..22f6fcb 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -61,6 +61,7 @@ config KVM_INTEL
>  	depends on KVM
>  	# for perf_guest_get_msrs():
>  	depends on CPU_SUP_INTEL
> +	select IRQ_BYPASS_MANAGER
>  	---help---
>  	  Provides support for KVM on Intel processors equipped with the VT
>  	  extensions.
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 579d83b..02912f1 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -2,6 +2,7 @@ config VFIO_PCI
>  	tristate "VFIO support for PCI devices"
>  	depends on VFIO && PCI && EVENTFD
>  	select VFIO_VIRQFD
> +	select IRQ_BYPASS_MANAGER
>  	help
>  	  Support for the PCI VFIO bus driver.  This is required to make
>  	  use of PCI drivers using the VFIO framework.
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1f577b4..4e053be 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -181,6 +181,7 @@ static int vfio_intx_set_signal(struct vfio_pci_device
> *vdev, int fd)
> 
>  	if (vdev->ctx[0].trigger) {
>  		free_irq(pdev->irq, vdev);
> +		/* irq_bypass_unregister_producer(); */
>  		kfree(vdev->ctx[0].name);
>  		eventfd_ctx_put(vdev->ctx[0].trigger);
>  		vdev->ctx[0].trigger = NULL;
> @@ -214,6 +215,8 @@ static int vfio_intx_set_signal(struct vfio_pci_device
> *vdev, int fd)
>  		return ret;
>  	}
> 
> +	/* irq_bypass_register_producer(); */
> +
>  	/*
>  	 * INTx disable will stick across the new irq setup,
>  	 * disable_irq won't.
> @@ -319,6 +322,7 @@ static int vfio_msi_set_vector_signal(struct
> vfio_pci_device *vdev,
> 
>  	if (vdev->ctx[vector].trigger) {
>  		free_irq(irq, vdev->ctx[vector].trigger);
> +		/* irq_bypass_unregister_producer(); */
>  		kfree(vdev->ctx[vector].name);
>  		eventfd_ctx_put(vdev->ctx[vector].trigger);
>  		vdev->ctx[vector].trigger = NULL;
> @@ -360,6 +364,8 @@ static int vfio_msi_set_vector_signal(struct
> vfio_pci_device *vdev,
>  		return ret;
>  	}
> 
> +	/* irq_bypass_register_producer(); */
> +
>  	vdev->ctx[vector].trigger = trigger;
> 
>  	return 0;
> diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
> new file mode 100644
> index 0000000..718508e
> --- /dev/null
> +++ b/include/linux/irqbypass.h
> @@ -0,0 +1,23 @@
> +#ifndef IRQBYPASS_H
> +#define IRQBYPASS_H
> +
> +#include <linux/list.h>
> +
> +struct irq_bypass_producer {
> +	struct list_head node;
> +	void *token;
> +	/* TBD */
> +};
> +
> +struct irq_bypass_consumer {
> +	struct list_head node;
> +	void *token;
> +	void (*add_producer)(struct irq_bypass_producer *);
> +	void (*del_producer)(struct irq_bypass_producer *);

These two callbacks should be common function, for PI, I need to add
something specific to x86, such as, updating the associated IRTE, how
should I do for this?

> +};
> +
> +int irq_bypass_register_producer(struct irq_bypass_producer *);
> +void irq_bypass_unregister_producer(struct irq_bypass_producer *);
> +int irq_bypass_register_consumer(struct irq_bypass_consumer *);
> +void irq_bypass_unregister_consumer(struct irq_bypass_consumer *);
> +#endif /* IRQBYPASS_H */
> diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
> index 9a76e3b..4502cdc 100644
> --- a/kernel/irq/Kconfig
> +++ b/kernel/irq/Kconfig
> @@ -100,4 +100,7 @@ config SPARSE_IRQ
> 
>  	  If you don't know what to do here, say N.
> 
> +config IRQ_BYPASS_MANAGER
> +	bool
> +
>  endmenu
> diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
> index d121235..a30ed77 100644
> --- a/kernel/irq/Makefile
> +++ b/kernel/irq/Makefile
> @@ -7,3 +7,4 @@ obj-$(CONFIG_PROC_FS) += proc.o
>  obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
>  obj-$(CONFIG_PM_SLEEP) += pm.o
>  obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
> +obj-$(CONFIG_IRQ_BYPASS_MANAGER) += bypass.o
> diff --git a/kernel/irq/bypass.c b/kernel/irq/bypass.c
> new file mode 100644
> index 0000000..5d0f92b
> --- /dev/null
> +++ b/kernel/irq/bypass.c

Is it better to put this code here or in vfio folder?

Thanks,
Feng

> @@ -0,0 +1,116 @@
> +/*
> + * IRQ offload/bypass manager
> + *
> + * Various virtualization hardware acceleration techniques allow bypassing
> + * or offloading interrupts receieved from devices around the host kernel.
> + * Posted Interrupts on Intel VT-d systems can allow interrupts to be
> + * recieved directly by a virtual machine.  ARM IRQ Forwarding can allow
> + * level triggered device interrupts to be de-asserted directly by the VM.
> + * This manager allows interrupt producers and consumers to find each other
> + * to enable this sort of bypass.
> + */
> +
> +#include <linux/irqbypass.h>
> +#include <linux/list.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +
> +static LIST_HEAD(producers);
> +static LIST_HEAD(consumers);
> +static DEFINE_MUTEX(lock);
> +
> +int irq_bypass_register_producer(struct irq_bypass_producer *producer)
> +{
> +	struct irq_bypass_producer *tmp;
> +	struct irq_bypass_consumer *consumer;
> +	int ret = 0;
> +
> +	mutex_lock(&lock);
> +
> +	list_for_each_entry(tmp, &producers, node) {
> +		if (tmp->token == producer->token) {
> +			ret = -EINVAL;
> +			goto unlock;
> +		}
> +	}
> +
> +	list_add(&producer->node, &producers);
> +
> +	list_for_each_entry(consumer, &consumers, node) {
> +		if (consumer->token == producer->token) {
> +			consumer->add_producer(producer);
> +			break;
> +		}
> +	}
> +unlock:
> +	mutex_unlock(&lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(irq_bypass_register_producer);
> +
> +void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
> +{
> +	struct irq_bypass_consumer *consumer;
> +
> +	mutex_lock(&lock);
> +
> +	list_for_each_entry(consumer, &consumers, node) {
> +		if (consumer->token == producer->token) {
> +			consumer->del_producer(producer);
> +			break;
> +		}
> +	}
> +
> +	list_del(&producer->node);
> +
> +	mutex_unlock(&lock);
> +}
> +EXPORT_SYMBOL_GPL(irq_bypass_unregister_producer);
> +
> +int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
> +{
> +	struct irq_bypass_consumer *tmp;
> +	struct irq_bypass_producer *producer;
> +	int ret = 0;
> +
> +	mutex_lock(&lock);
> +
> +	list_for_each_entry(tmp, &consumers, node) {
> +		if (tmp->token == consumer->token) {
> +			ret = -EINVAL;
> +			goto unlock;
> +		}
> +	}
> +
> +	list_add(&consumer->node, &consumers);
> +
> +	list_for_each_entry(producer, &producers, node) {
> +		if (producer->token == consumer->token) {
> +			consumer->add_producer(producer);
> +			break;
> +		}
> +	}
> +unlock:
> +	mutex_unlock(&lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(irq_bypass_register_consumer);
> +
> +void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
> +{
> +	struct irq_bypass_producer *producer;
> +
> +	mutex_lock(&lock);
> +
> +	list_for_each_entry(producer, &producers, node) {
> +		if (producer->token == consumer->token) {
> +			consumer->del_producer(producer);
> +			break;
> +		}
> +	}
> +
> +	list_del(&consumer->node);
> +
> +	mutex_unlock(&lock);
> +}
> +EXPORT_SYMBOL_GPL(irq_bypass_unregister_consumer);
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 9ff4193..f3da161 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -429,6 +429,8 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd
> *args)
>  	 */
>  	fdput(f);
> 
> +	/* irq_bypass_register_consumer(); */
> +
>  	return 0;
> 
>  fail:
> @@ -528,6 +530,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd
> *args)
>  	struct _irqfd *irqfd, *tmp;
>  	struct eventfd_ctx *eventfd;
> 
> +	/* irq_bypass_unregister_consumer() */
> +
>  	eventfd = eventfd_ctx_fdget(args->fd);
>  	if (IS_ERR(eventfd))
>  		return PTR_ERR(eventfd);
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ