lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 9 Dec 2020 09:42:37 -0500
From:   Eric Farman <farman@...ux.ibm.com>
To:     Cornelia Huck <cohuck@...hat.com>,
        "xuxiaoyang (C)" <xuxiaoyang2@...wei.com>
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        Alex Williamson <alex.williamson@...hat.com>,
        kwankhede@...dia.com, wu.wubin@...wei.com,
        maoming.maoming@...wei.com, xieyingtai@...wei.com,
        lizhengui@...wei.com, wubinfeng@...wei.com,
        Zhenyu Wang <zhenyuw@...ux.intel.com>,
        Zhi Wang <zhi.a.wang@...el.com>
Subject: Re: [PATCH v2] vfio iommu type1: Improve vfio_iommu_type1_pin_pages
 performance



On 12/9/20 6:54 AM, Cornelia Huck wrote:
> On Tue, 8 Dec 2020 21:55:53 +0800
> "xuxiaoyang (C)" <xuxiaoyang2@...wei.com> wrote:
> 
>> On 2020/11/21 15:58, xuxiaoyang (C) wrote:
>>> vfio_pin_pages() accepts an array of unrelated iova pfns and processes
>>> each to return the physical pfn.  When dealing with large arrays of
>>> contiguous iovas, vfio_iommu_type1_pin_pages is very inefficient because
>>> it is processed page by page.In this case, we can divide the iova pfn
>>> array into multiple continuous ranges and optimize them.  For example,
>>> when the iova pfn array is {1,5,6,7,9}, it will be divided into three
>>> groups {1}, {5,6,7}, {9} for processing.  When processing {5,6,7}, the
>>> number of calls to pin_user_pages_remote is reduced from 3 times to once.
>>> For single page or large array of discontinuous iovas, we still use
>>> vfio_pin_page_external to deal with it to reduce the performance loss
>>> caused by refactoring.
>>>
>>> Signed-off-by: Xiaoyang Xu <xuxiaoyang2@...wei.com>
> 
> (...)
> 
>>
>> hi Cornelia Huck, Eric Farman, Zhenyu Wang, Zhi Wang
>>
>> vfio_pin_pages() accepts an array of unrelated iova pfns and processes
>> each to return the physical pfn.  When dealing with large arrays of
>> contiguous iovas, vfio_iommu_type1_pin_pages is very inefficient because
>> it is processed page by page.  In this case, we can divide the iova pfn
>> array into multiple continuous ranges and optimize them.  I have a set
>> of performance test data for reference.
>>
>> The patch was not applied
>>                      1 page           512 pages
>> no huge pages:     1638ns           223651ns
>> THP:               1668ns           222330ns
>> HugeTLB:           1526ns           208151ns
>>
>> The patch was applied
>>                      1 page           512 pages
>> no huge pages       1735ns           167286ns
>> THP:               1934ns           126900ns
>> HugeTLB:           1713ns           102188ns
>>
>> As Alex Williamson said, this patch lacks proof that it works in the
>> real world. I think you will have some valuable opinions.
> 
> Looking at this from the vfio-ccw angle, I'm not sure how much this
> would buy us, as we deal with IDAWs, which are designed so that they
> can be non-contiguous. I guess this depends a lot on what the guest
> does.

This would be my concern too, but I don't have data off the top of my 
head to say one way or another...

> 
> Eric, any opinion? Do you maybe also happen to have a test setup that
> mimics workloads actually seen in the real world?
> 

...I do have some test setups, which I will try to get some data from in 
a couple days. At the moment I've broken most of those setups trying to 
implement some other stuff, and can't revert back at the moment. Will 
get back to this.

Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ