lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 13 Dec 2012 11:31:52 +0000
From:	Mats Petersson <mats.petersson@...rix.com>
To:	Vasiliy Tolstov <v.tolstov@...fip.ru>
CC:	"xen-devel@...ts.xen.org" <xen-devel@...ts.xen.org>,
	Ian Campbell <Ian.Campbell@...rix.com>,
	"konrad.wilk@...cle.com" <konrad.wilk@...cle.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"JBeulich@...e.com" <JBeulich@...e.com>,
	David Vrabel <david.vrabel@...rix.com>,
	"mats@...netcatfish.com" <mats@...netcatfish.com>
Subject: Re: [Xen-devel] [PATCH V4] xen/privcmd: improve-performance-of-mapping-of-guest

On 13/12/12 11:27, Vasiliy Tolstov wrote:
> If i apply this patch to kernel 3.6.7 does it needs to apply another
> patches to work?
No, it should "just work". Obviously, if it doesn't, let me know.

--
Mats
>
> 2012/12/13 Mats Petersson <mats.petersson@...rix.com>:
>> One comment asked for more details on the improvements:
>> Using a small test program to map Guest memory into Dom0 (repeatedly
>> for "Iterations" mapping the same first "Num Pages")
>> Iterations    Num Pages    Time 3.7rc4  Time With this patch
>> 5000          4096         76.107       37.027
>> 10000         2048         75.703       37.177
>> 20000         1024         75.893       37.247
>> So a little better than twice as fast.
>>
>> Using this patch in migration, using "time" to measure the overall
>> time it take to migrate a guest (Debian Squeeze 6.0, 1024MB guest
>> memory, one network card, one disk, guest at login prompt):
>> Time 3.7rc5             Time With this patch
>> 6.697                   5.667
>> Since migration involves a whole lot of other things, it's only about
>> 15% faster - but still a good improvement. Similar measurement with a
>> guest that is running code to "dirty" memory shows about 23%
>> improvement, as it spends more time copying dirtied memory.
>>
>> As discussed elsewhere, a good deal more can be had from improving the
>> munmap system call, but it is a little tricky to get this in without
>> worsening non-PVOPS kernel, so I will have another look at this.
>>
>> ---
>> Update since last posting:
>> I have just run some benchmarks of a 16GB guest, and the improvement
>> with this patch is around 23-30% for the overall copy time, and 42%
>> shorter downtime on a "busy" guest (writing in a loop to about 8GB of
>> memory). I think this is definitely worth having.
>>
>> The V4 patch consists of cosmetic adjustments (spelling mistake in
>> comment and reversing condition in an if-statement to avoid having an
>> "else" branch, a random empty line added by accident now reverted back
>> to previous state). Thanks to Jan Beulich for the comments.
>>
>> The V3 of the patch contains suggested improvements from:
>> David Vrabel - make it two distinct external functions, doc-comments.
>> Ian Campbell - use one common function for the main work.
>> Jan Beulich  - found a bug and pointed out some whitespace problems.
>>
>>
>>
>> Signed-off-by: Mats Petersson <mats.petersson@...rix.com>
>>
>> ---
>>   arch/x86/xen/mmu.c    |  132 +++++++++++++++++++++++++++++++++++++++++++------
>>   drivers/xen/privcmd.c |   55 +++++++++++++++++----
>>   include/xen/xen-ops.h |    5 ++
>>   3 files changed, 169 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
>> index dcf5f2d..a67774f 100644
>> --- a/arch/x86/xen/mmu.c
>> +++ b/arch/x86/xen/mmu.c
>> @@ -2477,7 +2477,8 @@ void __init xen_hvm_init_mmu_ops(void)
>>   #define REMAP_BATCH_SIZE 16
>>
>>   struct remap_data {
>> -       unsigned long mfn;
>> +       unsigned long *mfn;
>> +       bool contiguous;
>>          pgprot_t prot;
>>          struct mmu_update *mmu_update;
>>   };
>> @@ -2486,7 +2487,13 @@ static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
>>                                   unsigned long addr, void *data)
>>   {
>>          struct remap_data *rmd = data;
>> -       pte_t pte = pte_mkspecial(pfn_pte(rmd->mfn++, rmd->prot));
>> +       pte_t pte = pte_mkspecial(pfn_pte(*rmd->mfn, rmd->prot));
>> +       /* If we have a contigious range, just update the mfn itself,
>> +          else update pointer to be "next mfn". */
>> +       if (rmd->contiguous)
>> +               (*rmd->mfn)++;
>> +       else
>> +               rmd->mfn++;
>>
>>          rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
>>          rmd->mmu_update->val = pte_val_ma(pte);
>> @@ -2495,16 +2502,34 @@ static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
>>          return 0;
>>   }
>>
>> -int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
>> -                              unsigned long addr,
>> -                              unsigned long mfn, int nr,
>> -                              pgprot_t prot, unsigned domid)
>> -{
>> +/* do_remap_mfn() - helper function to map foreign pages
>> + * @vma:     the vma for the pages to be mapped into
>> + * @addr:    the address at which to map the pages
>> + * @mfn:     pointer to array of MFNs to map
>> + * @nr:      the number entries in the MFN array
>> + * @err_ptr: pointer to array
>> + * @prot:    page protection mask
>> + * @domid:   id of the domain that we are mapping from
>> + *
>> + * This function takes an array of mfns and maps nr pages from that into
>> + * this kernel's memory. The owner of the pages is defined by domid. Where the
>> + * pages are mapped is determined by addr, and vma is used for "accounting" of
>> + * the pages.
>> + *
>> + * Return value is zero for success, negative for failure.
>> + *
>> + * Note that err_ptr is used to indicate whether *mfn
>> + * is a list or a "first mfn of a contiguous range". */
>> +static int do_remap_mfn(struct vm_area_struct *vma,
>> +                       unsigned long addr,
>> +                       unsigned long *mfn, int nr,
>> +                       int *err_ptr, pgprot_t prot,
>> +                       unsigned domid)
>> +{
>> +       int err, last_err = 0;
>>          struct remap_data rmd;
>>          struct mmu_update mmu_update[REMAP_BATCH_SIZE];
>> -       int batch;
>>          unsigned long range;
>> -       int err = 0;
>>
>>          if (xen_feature(XENFEAT_auto_translated_physmap))
>>                  return -EINVAL;
>> @@ -2515,9 +2540,15 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
>>
>>          rmd.mfn = mfn;
>>          rmd.prot = prot;
>> +       /* We use the err_ptr to indicate if there we are doing a contigious
>> +        * mapping or a discontigious mapping. */
>> +       rmd.contiguous = !err_ptr;
>>
>>          while (nr) {
>> -               batch = min(REMAP_BATCH_SIZE, nr);
>> +               int index = 0;
>> +               int done = 0;
>> +               int batch = min(REMAP_BATCH_SIZE, nr);
>> +               int batch_left = batch;
>>                  range = (unsigned long)batch << PAGE_SHIFT;
>>
>>                  rmd.mmu_update = mmu_update;
>> @@ -2526,19 +2557,92 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
>>                  if (err)
>>                          goto out;
>>
>> -               err = HYPERVISOR_mmu_update(mmu_update, batch, NULL, domid);
>> -               if (err < 0)
>> -                       goto out;
>> +               /* We record the error for each page that gives an error, but
>> +                * continue mapping until the whole set is done */
>> +               do {
>> +                       err = HYPERVISOR_mmu_update(&mmu_update[index],
>> +                                                   batch_left, &done, domid);
>> +                       if (err < 0) {
>> +                               if (!err_ptr)
>> +                                       goto out;
>> +                               /* increment done so we skip the error item */
>> +                               done++;
>> +                               last_err = err_ptr[index] = err;
>> +                       }
>> +                       batch_left -= done;
>> +                       index += done;
>> +               } while (batch_left);
>>
>>                  nr -= batch;
>>                  addr += range;
>> +               if (err_ptr)
>> +                       err_ptr += batch;
>>          }
>>
>> -       err = 0;
>> +       err = last_err;
>>   out:
>>
>>          xen_flush_tlb_all();
>>
>>          return err;
>>   }
>> +
>> +/* xen_remap_domain_mfn_range() - Used to map foreign pages
>> + * @vma:   the vma for the pages to be mapped into
>> + * @addr:  the address at which to map the pages
>> + * @mfn:   the first MFN to map
>> + * @nr:    the number of consecutive mfns to map
>> + * @prot:  page protection mask
>> + * @domid: id of the domain that we are mapping from
>> + *
>> + * This function takes a mfn and maps nr pages on from that into this kernel's
>> + * memory. The owner of the pages is defined by domid. Where the pages are
>> + * mapped is determined by addr, and vma is used for "accounting" of the
>> + * pages.
>> + *
>> + * Return value is zero for success, negative ERRNO value for failure.
>> + */
>> +int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
>> +                              unsigned long addr,
>> +                              unsigned long mfn, int nr,
>> +                              pgprot_t prot, unsigned domid)
>> +{
>> +       return do_remap_mfn(vma, addr, &mfn, nr, NULL, prot, domid);
>> +}
>>   EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_range);
>> +
>> +/* xen_remap_domain_mfn_array() - Used to map an array of foreign pages
>> + * @vma:     the vma for the pages to be mapped into
>> + * @addr:    the address at which to map the pages
>> + * @mfn:     pointer to array of MFNs to map
>> + * @nr:      the number entries in the MFN array
>> + * @err_ptr: pointer to array of integers, one per MFN, for an error
>> + *           value for each page. The err_ptr must not be NULL.
>> + * @prot:    page protection mask
>> + * @domid:   id of the domain that we are mapping from
>> + *
>> + * This function takes an array of mfns and maps nr pages from that into this
>> + * kernel's memory. The owner of the pages is defined by domid. Where the pages
>> + * are mapped is determined by addr, and vma is used for "accounting" of the
>> + * pages. The err_ptr array is filled in on any page that is not sucessfully
>> + * mapped in.
>> + *
>> + * Return value is zero for success, negative ERRNO value for failure.
>> + * Note that the error value -ENOENT is considered a "retry", so when this
>> + * error code is seen, another call should be made with the list of pages that
>> + * are marked as -ENOENT in the err_ptr array.
>> + */
>> +int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
>> +                              unsigned long addr,
>> +                              unsigned long *mfn, int nr,
>> +                              int *err_ptr, pgprot_t prot,
>> +                              unsigned domid)
>> +{
>> +       /* We BUG_ON because it's a programmer error to pass a NULL err_ptr,
>> +        * and the consequences later is quite hard to detect what the actual
>> +        * cause of "wrong memory was mapped in".
>> +        */
>> +       BUG_ON(err_ptr == NULL);
>> +       return do_remap_mfn(vma, addr, mfn, nr, err_ptr, prot, domid);
>> +}
>> +EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array);
>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
>> index 71f5c45..75f6e86 100644
>> --- a/drivers/xen/privcmd.c
>> +++ b/drivers/xen/privcmd.c
>> @@ -151,6 +151,40 @@ static int traverse_pages(unsigned nelem, size_t size,
>>          return ret;
>>   }
>>
>> +/*
>> + * Similar to traverse_pages, but use each page as a "block" of
>> + * data to be processed as one unit.
>> + */
>> +static int traverse_pages_block(unsigned nelem, size_t size,
>> +                               struct list_head *pos,
>> +                               int (*fn)(void *data, int nr, void *state),
>> +                               void *state)
>> +{
>> +       void *pagedata;
>> +       unsigned pageidx;
>> +       int ret = 0;
>> +
>> +       BUG_ON(size > PAGE_SIZE);
>> +
>> +       pageidx = PAGE_SIZE;
>> +
>> +       while (nelem) {
>> +               int nr = (PAGE_SIZE/size);
>> +               struct page *page;
>> +               if (nr > nelem)
>> +                       nr = nelem;
>> +               pos = pos->next;
>> +               page = list_entry(pos, struct page, lru);
>> +               pagedata = page_address(page);
>> +               ret = (*fn)(pagedata, nr, state);
>> +               if (ret)
>> +                       break;
>> +               nelem -= nr;
>> +       }
>> +
>> +       return ret;
>> +}
>> +
>>   struct mmap_mfn_state {
>>          unsigned long va;
>>          struct vm_area_struct *vma;
>> @@ -250,7 +284,7 @@ struct mmap_batch_state {
>>           *      0 for no errors
>>           *      1 if at least one error has happened (and no
>>           *          -ENOENT errors have happened)
>> -        *      -ENOENT if at least 1 -ENOENT has happened.
>> +        *      -ENOENT if at least one -ENOENT has happened.
>>           */
>>          int global_error;
>>          /* An array for individual errors */
>> @@ -260,17 +294,20 @@ struct mmap_batch_state {
>>          xen_pfn_t __user *user_mfn;
>>   };
>>
>> -static int mmap_batch_fn(void *data, void *state)
>> +static int mmap_batch_fn(void *data, int nr, void *state)
>>   {
>>          xen_pfn_t *mfnp = data;
>>          struct mmap_batch_state *st = state;
>>          int ret;
>>
>> -       ret = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK, *mfnp, 1,
>> -                                        st->vma->vm_page_prot, st->domain);
>> +       BUG_ON(nr < 0);
>>
>> -       /* Store error code for second pass. */
>> -       *(st->err++) = ret;
>> +       ret = xen_remap_domain_mfn_array(st->vma,
>> +                                        st->va & PAGE_MASK,
>> +                                        mfnp, nr,
>> +                                        st->err,
>> +                                        st->vma->vm_page_prot,
>> +                                        st->domain);
>>
>>          /* And see if it affects the global_error. */
>>          if (ret < 0) {
>> @@ -282,7 +319,7 @@ static int mmap_batch_fn(void *data, void *state)
>>                                  st->global_error = 1;
>>                  }
>>          }
>> -       st->va += PAGE_SIZE;
>> +       st->va += PAGE_SIZE * nr;
>>
>>          return 0;
>>   }
>> @@ -378,8 +415,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
>>          state.err           = err_array;
>>
>>          /* mmap_batch_fn guarantees ret == 0 */
>> -       BUG_ON(traverse_pages(m.num, sizeof(xen_pfn_t),
>> -                            &pagelist, mmap_batch_fn, &state));
>> +       BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
>> +                                   &pagelist, mmap_batch_fn, &state));
>>
>>          up_write(&mm->mmap_sem);
>>
>> diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
>> index 6a198e4..22cad75 100644
>> --- a/include/xen/xen-ops.h
>> +++ b/include/xen/xen-ops.h
>> @@ -29,4 +29,9 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
>>                                 unsigned long mfn, int nr,
>>                                 pgprot_t prot, unsigned domid);
>>
>> +int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
>> +                              unsigned long addr,
>> +                              unsigned long *mfn, int nr,
>> +                              int *err_ptr, pgprot_t prot,
>> +                              unsigned domid);
>>   #endif /* INCLUDE_XEN_OPS_H */
>> --
>> 1.7.9.5
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@...ts.xen.org
>> http://lists.xen.org/xen-devel
>
>
> --
> Vasiliy Tolstov,
> Clodo.ru
> e-mail: v.tolstov@...fip.ru
> jabber: vase@...fip.ru

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ