linux-kernel - Re: [PATCH v2 2/4] KVM: introduce "xinterface" API for external interaction with guests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 05 Oct 2009 19:33:53 -0400
From:	Gregory Haskins <gregory.haskins@...il.com>
To:	Marcelo Tosatti <mtosatti@...hat.com>
CC:	Gregory Haskins <ghaskins@...ell.com>, kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	"alacrityvm-devel@...ts.sourceforge.net" 
	<alacrityvm-devel@...ts.sourceforge.net>
Subject: Re: [PATCH v2 2/4] KVM: introduce "xinterface" API for external interaction
 with guests

Hi Marcelo!

Marcelo Tosatti wrote:
> On Fri, Oct 02, 2009 at 04:19:27PM -0400, Gregory Haskins wrote:
>> What: xinterface is a mechanism that allows kernel modules external to
>> the kvm.ko proper to interface with a running guest.  It accomplishes
>> this by creating an abstracted interface which does not expose any
>> private details of the guest or its related KVM structures, and provides
>> a mechanism to find and bind to this interface at run-time.
>>
>> Why: There are various subsystems that would like to interact with a KVM
>> guest which are ideally suited to exist outside the domain of the kvm.ko
>> core logic. For instance, external pci-passthrough, virtual-bus, and
>> virtio-net modules are currently under development.  In order for these
>> modules to successfully interact with the guest, they need, at the very
>> least, various interfaces for signaling IO events, pointer translation,
>> and possibly memory mapping.
>>
>> The signaling case is covered by the recent introduction of the
>> irqfd/ioeventfd mechanisms.  This patch provides a mechanism to cover the
>> other cases.  Note that today we only expose pointer-translation related
>> functions, but more could be added at a future date as needs arise.
>>
>> Example usage: QEMU instantiates a guest, and an external module "foo"
>> that desires the ability to interface with the guest (say via
>> open("/dev/foo")).  QEMU may then pass the kvmfd to foo via an
>> ioctl, such as: ioctl(foofd, FOO_SET_VMID, &kvmfd).  Upon receipt, the
>> foo module can issue kvm_xinterface_bind(kvmfd) to acquire
>> the proper context.  Internally, the struct kvm* and associated
>> struct module* will remain pinned at least until the foo module calls
>> kvm_xinterface_put().
> 
>> --- /dev/null
>> +++ b/virt/kvm/xinterface.c
>> @@ -0,0 +1,409 @@
>> +/*
>> + * KVM module interface - Allows external modules to interface with a guest
>> + *
>> + * Copyright 2009 Novell.  All Rights Reserved.
>> + *
>> + * Author:
>> + *      Gregory Haskins <ghaskins@...ell.com>
>> + *
>> + * This file is free software; you can redistribute it and/or modify
>> + * it under the terms of version 2 of the GNU General Public License
>> + * as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software Foundation,
>> + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
>> + */
>> +
>> +#include <linux/mm.h>
>> +#include <linux/vmalloc.h>
>> +#include <linux/highmem.h>
>> +#include <linux/module.h>
>> +#include <linux/mmu_context.h>
>> +#include <linux/kvm_host.h>
>> +#include <linux/kvm_xinterface.h>
>> +
>> +struct _xinterface {
>> +	struct kvm             *kvm;
>> +	struct task_struct     *task;
>> +	struct mm_struct       *mm;
>> +	struct kvm_xinterface   intf;
>> +	struct kvm_memory_slot *slotcache[NR_CPUS];
>> +};
>> +
>> +struct _xvmap {
>> +	struct kvm_memory_slot    *memslot;
>> +	unsigned long              npages;
>> +	struct kvm_xvmap           vmap;
>> +};
>> +
>> +static struct _xinterface *
>> +to_intf(struct kvm_xinterface *intf)
>> +{
>> +	return container_of(intf, struct _xinterface, intf);
>> +}
>> +
>> +#define _gfn_to_hva(gfn, memslot) \
>> +	(memslot->userspace_addr + (gfn - memslot->base_gfn) * PAGE_SIZE)
>> +
>> +/*
>> + * gpa_to_hva() - translate a guest-physical to host-virtual using
>> + * a per-cpu cache of the memslot.
>> + *
>> + * The gfn_to_memslot() call is relatively expensive, and the gpa access
>> + * patterns exhibit a high degree of locality.  Therefore, lets cache
>> + * the last slot used on a per-cpu basis to optimize the lookup
>> + *
>> + * assumes slots_lock held for read
>> + */
>> +static unsigned long
>> +gpa_to_hva(struct _xinterface *_intf, unsigned long gpa)
>> +{
>> +	int                     cpu     = get_cpu();
>> +	unsigned long           gfn     = gpa >> PAGE_SHIFT;
>> +	struct kvm_memory_slot *memslot = _intf->slotcache[cpu];
>> +	unsigned long           addr    = 0;
>> +
>> +	if (!memslot
>> +	    || gfn < memslot->base_gfn
>> +	    || gfn >= memslot->base_gfn + memslot->npages) {
>> +
>> +		memslot = gfn_to_memslot(_intf->kvm, gfn);
>> +		if (!memslot)
>> +			goto out;
>> +
>> +		_intf->slotcache[cpu] = memslot;
>> +	}
>> +
>> +	addr = _gfn_to_hva(gfn, memslot) + offset_in_page(gpa);
>> +
>> +out:
>> +	put_cpu();
>> +
>> +	return addr;
> 
> Please optimize gfn_to_memslot() instead, so everybody benefits. It
> shows very often on profiles.

Yeah, its not a bad idea.  The reason why I did it here is because the
requirements for sync (kvm-vcpu) vs async (xinterface) access is
slightly different.  Sync is probably optimal with per-vcpu caching,
whereas async is optimal with per-cpu.

That said, we could probably build the entire algorithm to be per-cpu as
a compromise and still gain benefits.  Perhaps I will split this out as
a separate patch for v3.

> 
>> +
>> +	page_list = (struct page **) __get_free_page(GFP_KERNEL);
>> +	if (!page_list)
>> +		return NULL;
>> +
>> +	down_write(&mm->mmap_sem);
>> +
>> +	ret = get_user_pages(p, mm, addr, npages, 1, 0, page_list, NULL);
>> +	if (ret < 0)
>> +		goto out;
>> +
>> +	ptr = vmap(page_list, npages, VM_MAP, PAGE_KERNEL);
>> +	if (ptr)
>> +		mm->locked_vm += npages;
> 
> Why don't you use gfn_to_page (here and elsewhere in the patch).

Primarily ignorance, I suspect ;)

The truth is I ported this from one of our other connectors, which was
more userspace oriented and thus gup() made sense and gtp() was not an
option.  That said, it probably doesn't matter a ton in the vmap case,
because that is slow-path.  However, I will definitely look to change
over to the gtp() variant, especially if it affects any fast path code.

Thanks Marcelo,
-Greg


Download attachment "signature.asc" of type "application/pgp-signature" (268 bytes)