lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200219090417.GA30338@joy-OptiPlex-7040>
Date:   Wed, 19 Feb 2020 04:04:17 -0500
From:   Yan Zhao <yan.y.zhao@...el.com>
To:     Alex Williamson <alex.williamson@...hat.com>
Cc:     "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 0/3] vfio/type1: Reduce vfio_iommu.lock contention

On Fri, Jan 17, 2020 at 09:10:51AM +0800, Yan Zhao wrote:
> Thank you, Alex!
> I'll try it and let you know the result soon. :)
> 
> On Fri, Jan 17, 2020 at 02:17:49AM +0800, Alex Williamson wrote:
> > Hi Yan,
> > 
> > I wonder if this might reduce the lock contention you're seeing in the
> > vfio_dma_rw series.  These are only compile tested on my end, so I hope
> > they're not too broken to test.  Thanks,
> > 
> > Alex
> > 
> > ---
> > 
> > Alex Williamson (3):
> >       vfio/type1: Convert vfio_iommu.lock from mutex to rwsem
> >       vfio/type1: Replace obvious read lock instances
> >       vfio/type1: Introduce pfn_list mutex
> > 
> > 
> >  drivers/vfio/vfio_iommu_type1.c |   67 ++++++++++++++++++++++++---------------
> >  1 file changed, 41 insertions(+), 26 deletions(-)
> >

hi Alex
I have finished testing of this series.
It's quite stable and passed our MTBF testing :)

However, after comparing the performance data obtained from several
benchmarks in guests (see below),
it seems that this series does not bring in obvious benefit.
(at least to cases we have tested, and though I cannot fully explain it yet).
So, do you think it's good for me not to include this series into my next
version of "use vfio_dma_rw to read/write IOVAs from CPU side"?


B: stands for baseline code, where mutex is used for vfio_iommu.lock
B+S: applied rwsem patches to convert vfio_iommu.lock from mutex to
rwsem.

==== comparison: benchmark scores ====
(1) with 1 VM:

 score  |     glmark2    |   lightsmark    |   openarena
-----------------------------------------------------------
      B | 1248 (100%)    | 219.70 (100%)   | 114.9 (100%)
    B+S | 1252 (100.3%)  | 222.76 (101.2%) | 114.8 ( 99.9%)


(2) with 2 VMs:

 score  |     glmark2    |   lightsmark    |   openarena                                       
-----------------------------------------------------------                                    
      B | 812   (100%)   | 211.46 (100%)   | 115.3 (100%)                                      
    B+S | 812.8 (100.1%) | 212.96 (100.7%) | 114.9 (99.6%) 


==== comparison: average cycles spent on vfio_iommu.lock =====
(1) with 1 VM:

 cycles | glmark2   | lightsmark | openarena | VM boot up
---------------------------------------------------------
      B | 107       | 113        | 110       | 107
    B+S | 112 (+5)  | 111  (-2)  | 108 (-2)  | 104 (-3)

Note:
a. during VM boot up, for rwsem, there are 24921 reads vs 67 writes
(372:1)
b. for the mesured 3 benchmarks, no write for rwsem.


(2) with 2 VMs:

 cycles | glmark2   | lightsmark | openarena | VM boot up
----------------------------------------------------------
      B | 113       | 119        | 112       | 119
    B+S | 118 (+5)  | 138  (+19) | 110 (-2)  | 114 (-5)


similar results obtained after applying patches of vfio_dma_rw.

B: stands for baseline code, where mutex is used for vfio_iommu.lock
B+V: baseline code + patches to convert from using kvm_read/write_guest
to using vfio_dma_rw
B+V+S: baseline code + patches to using vfio_dma_rw + patches to use
rwsem

==== comparison: benchmark scores =====
(1) with 1 VM:

 score  |     glmark2    |   lightsmark    |   openarena
----------------------------------------------------------
    B+V | 1244 (100%)    | 222.18 (100%)   | 114.4 (100%)
  B+V+S | 1241 ( 99.8%)  | 223.90 (100.8%) | 114.6 (100.2%)

(2) with 2 VMs:

        |     glmark2    |   lightsmark    |   openarena
----------------------------------------------------------
    B+V | 811.2 (100%)   | 211.20 (100%)   | 115.4 (100%)
  B+V+S | 811   (99.98%) | 211.81 (100.3%) | 115.5 (100.1%)


==== comparison: average cycles spent on vfio_dma_rw =====
(1) with 1 VM:

cycles  |    glmark2  | lightsmark | openarena
--------------------------------------------------
    B+V | 1396        | 1592       | 1351 
  B+V+S | 1415 (+19 ) | 1650 (+58) | 1357 (+6)

(2) with 2 VMs:

cycles  |    glmark2  | lightsmark | openarena
--------------------------------------------------
    B+V | 1974        | 2024       | 1636
  B+V+S | 1979 (+5)   | 2051 (+27) | 1644 (+8)


==== comparison: average cycles spent on vfio_iommu.lock =====
(1) with 1 VM:

 cycles | glmark2   | lightsmark | openarena | VM boot up
---------------------------------------------------------
    B+V | 137       | 139        | 156       | 124
  B+V+S | 142 (+5)  | 143 (+4)   | 149 (-7)  | 114 (-10)

(2) with 2 VMs:

 cycles | glmark2   | lightsmark | openarena | VM boot up
---------------------------------------------------------
    B+V | 153       | 148        | 146       | 111
  B+V+S | 155 (+2)  | 157 (+9)   | 156 (+10) | 118 (+7)


P.S.
You may find some inconsistency when comparing to the test result I sent
at https://lkml.org/lkml/2020/1/14/1486. It is because I had to changed
my test machine for personal reason and also because I made lightsmark not
to sync on vblank events.


Thanks
Yan


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ