lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ba7e528b-8f60-5c0d-868f-98a8fa5cf847@redhat.com>
Date:   Fri, 5 Aug 2022 11:33:30 +0200
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Leonardo Bras <leobras@...hat.com>, Peter Xu <peterx@...hat.com>
Cc:     kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 1/1] kvm: Use a new spinlock to avoid atomic
 operations in kvm_get_dirty_log_protect

On 8/5/22 09:35, Leonardo Bras wrote:
> 1 - Spin-Locking in mark_page_dirty_in_slot():
> I understand this function happens a lot in the guest and should probably
> be as fast as possible, so introducing a lock here seems
> counter-productive, but to be fair, I could not see it any slower than a
> couple cycles in my current setup (x86_64 machine).

Maybe too small workload?  32 vcpus at 8 GB/s would mean 256 MB/s/vCPU, 
i.e. 16384 pages/second *at most*.  That might not create too much 
contention.

One possibility here is to use a global (for all VMs) 
percpu_rw_semaphore, or perhaps even RCU.  The write critical section is 
so short that it could be a win nevertheless.

However...

> 2 - Qemu will use the 'manual_dirty_log_protect'
> I understand that more recent versions qemu will use
> 'manual_dirty_log_protect' when available, so this approach will not
> benefit this use case, which is quite common.
> A counter argument would be: there are other hypervisors that could benefit
> from it, and that is also applicable for older qemu versions.

... that was my first thought indeed.  I would just consider the old API 
to be legacy and not bother with it.  Mostly because of the ability to 
clear a small part of the bitmap(*), and the initially-all-set 
optimization, manual dirty log ought to be superior even if 
CLEAR_DIRTY_LOG has to use atomics.

> - I am also trying to think on improvements for the
>   'manual_dirty_log_protect' use case, which seems to be very hard to
>   improve. For starters, using the same approach to remove the atomics
>   does not seem to cause any relevant speedup.

Yes, there are two issues:

1) CLEAR_DIRTY_LOG does not clear all bits, only those passed in by 
userspace.  This means that the inactive bitmap still has some bits set

2) the ability to clear part of the bitmap makes it hard to do a 
wholesale switch in CLEAR_DIRTY_LOG; this is the dealbreaker

Thanks,

Paolo

(*) for the old API, that requires a workaround of using multiple small 
memslots, e.g. 1-32G in size

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ