lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1548439286.17444.14.camel@amazon.de>
Date:   Fri, 25 Jan 2019 18:01:26 +0000
From:   "Raslan, KarimAllah" <karahmed@...zon.de>
To:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "david@...hat.com" <david@...hat.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "rkrcmar@...hat.com" <rkrcmar@...hat.com>
Subject: Re: [PATCH v5 04/13] KVM: Introduce a new guest mapping API

On Thu, 2019-01-10 at 14:07 +0100, David Hildenbrand wrote:
> On 09.01.19 10:42, KarimAllah Ahmed wrote:
> > 
> > In KVM, specially for nested guests, there is a dominant pattern of:
> > 
> > 	=> map guest memory -> do_something -> unmap guest memory
> > 
> > In addition to all this unnecessarily noise in the code due to boiler plate
> > code, most of the time the mapping function does not properly handle memory
> > that is not backed by "struct page". This new guest mapping API encapsulate
> > most of this boiler plate code and also handles guest memory that is not
> > backed by "struct page".
> > 
> > The current implementation of this API is using memremap for memory that is
> > not backed by a "struct page" which would lead to a huge slow-down if it
> > was used for high-frequency mapping operations. The API does not have any
> > effect on current setups where guest memory is backed by a "struct page".
> > Further patches are going to also introduce a pfn-cache which would
> > significantly improve the performance of the memremap case.
> > 
> > Signed-off-by: KarimAllah Ahmed <karahmed@...zon.de>
> > ---
> > v3 -> v4:
> > - Update the commit message.
> > v1 -> v2:
> > - Drop the caching optimization (pbonzini)
> > - Use 'hva' instead of 'kaddr' (pbonzini)
> > - Return 0/-EINVAL/-EFAULT instead of true/false. -EFAULT will be used for
> >   AMD patch (pbonzini)
> > - Introduce __kvm_map_gfn which accepts a memory slot and use it (pbonzini)
> > - Only clear map->hva instead of memsetting the whole structure.
> > - Drop kvm_vcpu_map_valid since it is no longer used.
> > - Fix EXPORT_MODULE naming.
> > ---
> >  include/linux/kvm_host.h |  9 ++++++++
> >  virt/kvm/kvm_main.c      | 53 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 62 insertions(+)
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index c38cc5e..8a2f5fa 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -205,6 +205,13 @@ enum {
> >  	READING_SHADOW_PAGE_TABLES,
> >  };
> >  
> > +struct kvm_host_map {
> > +	struct page *page;
> 
> Can you add somme comments to what it means when there is a page vs.
> when there is none?
> 
> > 
> > +	void *hva;
> > +	kvm_pfn_t pfn;
> > +	kvm_pfn_t gfn;
> > +};
> > +
> >  /*
> >   * Sometimes a large or cross-page mmio needs to be broken up into separate
> >   * exits for userspace servicing.
> > @@ -710,7 +717,9 @@ struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
> >  struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
> >  kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
> >  kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
> > +int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
> >  struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn);
> > +void kvm_vcpu_unmap(struct kvm_host_map *map, bool dirty);
> >  unsigned long kvm_vcpu_gfn_to_hva(struct kvm_vcpu *vcpu, gfn_t gfn);
> >  unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *writable);
> >  int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data, int offset,
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 1f888a1..4d8f2e3 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -1733,6 +1733,59 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
> >  }
> >  EXPORT_SYMBOL_GPL(gfn_to_page);
> >  
> > +static int __kvm_map_gfn(struct kvm_memory_slot *slot, gfn_t gfn,
> > +			 struct kvm_host_map *map)
> > +{
> > +	kvm_pfn_t pfn;
> > +	void *hva = NULL;
> > +	struct page *page = NULL;
> 
> nit: I prefer these in a growing line-length fashion.
> 
> > 
> > +
> > +	pfn = gfn_to_pfn_memslot(slot, gfn);
> > +	if (is_error_noslot_pfn(pfn))
> > +		return -EINVAL;
> > +
> > +	if (pfn_valid(pfn)) {
> > +		page = pfn_to_page(pfn);
> > +		hva = kmap(page);
> > +	} else {
> > +		hva = memremap(pfn_to_hpa(pfn), PAGE_SIZE, MEMREMAP_WB);
> > +	}
> > +
> > +	if (!hva)
> > +		return -EFAULT;
> > +
> > +	map->page = page;
> > +	map->hva = hva;
> > +	map->pfn = pfn;
> > +	map->gfn = gfn;
> > +
> > +	return 0;
> > +}
> > +
> > +int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map)
> > +{
> > +	return __kvm_map_gfn(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn, map);
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_vcpu_map);
> > +
> > +void kvm_vcpu_unmap(struct kvm_host_map *map, bool dirty)
> > +{
> > +	if (!map->hva)
> > +		return;
> > +
> > +	if (map->page)
> > +		kunmap(map->page);
> > +	else
> > +		memunmap(map->hva);
> > +
> > +	if (dirty)
> 
> 
> I am wondering if this would also be the right place for
> 
> kvm_vcpu_mark_page_dirty() to mark the page dirty for migration.

I indeed considered this, however, either I am missing something or this 
mark_page_dirty is missing accidentally in a couple of places! For example:

1) When unmapping the EVMCS page (in nested_release_evmcs) where is it marked as
   dirty?
2) The mapping changes in svm.c, where is marking it dirty?

However, it is handled properly in the rest:

3) For emulator_cmpxchg_emulated it is done, so no problem here.
4) The posted interrupts for L12 is done, so no problem here.
5) The virtual apic page is done, so no problem here. 

Is there any reason why it would not be needed in 1 and 2 above other than
being a bug?

That being said, good point. I will merge your suggestion in v6 when I rebase 
again :)

> 
> > 
> > +		kvm_release_pfn_dirty(map->pfn);
> > +	else
> > +		kvm_release_pfn_clean(map->pfn);
> > +	map->hva = NULL;
> 
> > 
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
> > +
> >  struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn)
> >  {
> >  	kvm_pfn_t pfn;
> > 
> 
> 



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich
Ust-ID: DE 289 237 879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ