lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHTA-ub+_txMHOG1YmtnPRnwSgU0eLrN6kjA5u4b+cJ=ja2L7Q@mail.gmail.com>
Date: Thu, 6 Feb 2025 19:39:00 -0600
From: Mitchell Augustin <mitchell.augustin@...onical.com>
To: Alex Williamson <alex.williamson@...hat.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org, peterx@...hat.com, 
	clg@...hat.com, akpm@...ux-foundation.org, linux-mm@...ck.org
Subject: Re: [PATCH 5/5] vfio/type1: Use mapping page mask for pfnmaps

LGTM and completely eliminates guest VM PCI initialization slowdowns
on H100 and A100.
Also not seeing any obvious regressions on my side.

Reported-by: "Mitchell Augustin" <mitchell.augustin@...onical.com>
Reviewed-by: "Mitchell Augustin" <mitchell.augustin@...onical.com>
Tested-by: "Mitchell Augustin" <mitchell.augustin@...onical.com>


On Wed, Feb 5, 2025 at 5:18 PM Alex Williamson
<alex.williamson@...hat.com> wrote:
>
> vfio-pci supports huge_fault for PCI MMIO BARs and will insert pud and
> pmd mappings for well aligned mappings.  follow_pfnmap_start() walks the
> page table and therefore knows the page mask of the level where the
> address is found and returns this through follow_pfnmap_args.pgmask.
> Subsequent pfns from this address until the end of the mapping page are
> necessarily consecutive.  Use this information to retrieve a range of
> pfnmap pfns in a single pass.
>
> With optimal mappings and alignment on systems with 1GB pud and 4KB
> page size, this reduces iterations for DMA mapping PCI BARs by a
> factor of 256K.  In real world testing, the overhead of iterating
> pfns for a VM DMA mapping a 32GB PCI BAR is reduced from ~1s to
> sub-millisecond overhead.
>
> Signed-off-by: Alex Williamson <alex.williamson@...hat.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 24 +++++++++++++++++-------
>  1 file changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 939920454da7..6f3e8d981311 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -520,7 +520,7 @@ static void vfio_batch_fini(struct vfio_batch *batch)
>
>  static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
>                             unsigned long vaddr, unsigned long *pfn,
> -                           bool write_fault)
> +                           unsigned long *pgmask, bool write_fault)
>  {
>         struct follow_pfnmap_args args = { .vma = vma, .address = vaddr };
>         int ret;
> @@ -544,10 +544,12 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
>                         return ret;
>         }
>
> -       if (write_fault && !args.writable)
> +       if (write_fault && !args.writable) {
>                 ret = -EFAULT;
> -       else
> +       } else {
>                 *pfn = args.pfn;
> +               *pgmask = args.pgmask;
> +       }
>
>         follow_pfnmap_end(&args);
>         return ret;
> @@ -590,15 +592,23 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
>         vma = vma_lookup(mm, vaddr);
>
>         if (vma && vma->vm_flags & VM_PFNMAP) {
> -               ret = follow_fault_pfn(vma, mm, vaddr, pfn, prot & IOMMU_WRITE);
> +               unsigned long pgmask;
> +
> +               ret = follow_fault_pfn(vma, mm, vaddr, pfn, &pgmask,
> +                                      prot & IOMMU_WRITE);
>                 if (ret == -EAGAIN)
>                         goto retry;
>
>                 if (!ret) {
> -                       if (is_invalid_reserved_pfn(*pfn))
> -                               ret = 1;
> -                       else
> +                       if (is_invalid_reserved_pfn(*pfn)) {
> +                               unsigned long epfn;
> +
> +                               epfn = (((*pfn << PAGE_SHIFT) + ~pgmask + 1)
> +                                       & pgmask) >> PAGE_SHIFT;
> +                               ret = min_t(int, npages, epfn - *pfn);
> +                       } else {
>                                 ret = -EFAULT;
> +                       }
>                 }
>         }
>  done:
> --
> 2.47.1
>


--
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ