lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <81d73c4c-28c4-4fa0-bc71-aef6429e2c31@redhat.com>
Date: Thu, 22 May 2025 09:22:50 +0200
From: David Hildenbrand <david@...hat.com>
To: lizhe.67@...edance.com, alex.williamson@...hat.com
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org, muchun.song@...ux.dev,
 peterx@...hat.com
Subject: Re: [PATCH v4] vfio/type1: optimize vfio_pin_pages_remote() for large
 folio

On 22.05.25 05:49, lizhe.67@...edance.com wrote:
> On Wed, 21 May 2025 13:17:11 -0600, alex.williamson@...hat.com wrote:
> 
>>> From: Li Zhe <lizhe.67@...edance.com>
>>>
>>> When vfio_pin_pages_remote() is called with a range of addresses that
>>> includes large folios, the function currently performs individual
>>> statistics counting operations for each page. This can lead to significant
>>> performance overheads, especially when dealing with large ranges of pages.
>>>
>>> This patch optimize this process by batching the statistics counting
>>> operations.
>>>
>>> The performance test results for completing the 8G VFIO IOMMU DMA mapping,
>>> obtained through trace-cmd, are as follows. In this case, the 8G virtual
>>> address space has been mapped to physical memory using hugetlbfs with
>>> pagesize=2M.
>>>
>>> Before this patch:
>>> funcgraph_entry:      # 33813.703 us |  vfio_pin_map_dma();
>>>
>>> After this patch:
>>> funcgraph_entry:      # 16071.378 us |  vfio_pin_map_dma();
>>>
>>> Signed-off-by: Li Zhe <lizhe.67@...edance.com>
>>> Co-developed-by: Alex Williamson <alex.williamson@...hat.com>
>>> Signed-off-by: Alex Williamson <alex.williamson@...hat.com>
>>> ---
>>
>> Given the discussion on v3, this is currently a Nak.  Follow-up in that
>> thread if there are further ideas how to salvage this.  Thanks,
> 
> How about considering the solution David mentioned to check whether the
> pages or PFNs are actually consecutive?
> 
> I have conducted a preliminary attempt, and the performance testing
> revealed that the time consumption is approximately 18,000 microseconds.
> Compared to the previous 33,000 microseconds, this also represents a
> significant improvement.
> 
> The modification is quite straightforward. The code below reflects the
> changes I have made based on this patch.
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index bd46ed9361fe..1cc1f76d4020 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -627,6 +627,19 @@ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
>          return ret;
>   }
>   
> +static inline long continuous_page_num(struct vfio_batch *batch, long npage)
> +{
> +       long i;
> +       unsigned long next_pfn = page_to_pfn(batch->pages[batch->offset]) + 1;
> +
> +       for (i = 1; i < npage; ++i) {
> +               if (page_to_pfn(batch->pages[batch->offset + i]) != next_pfn)
> +                       break;
> +               next_pfn++;
> +       }
> +       return i;
> +}


What might be faster is obtaining the folio, and then calculating the 
next expected page pointer, comparing whether the page pointers match.

Essentially, using folio_page() to calculate the expected next page.

nth_page() is a simple pointer arithmetic with CONFIG_SPARSEMEM_VMEMMAP, 
so that might be rather fast.


So we'd obtain

start_idx = folio_idx(folio, batch->pages[batch->offset]);

and then check for

batch->pages[batch->offset + i] == folio_page(folio, start_idx + i)

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ