lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250124191109.205955-3-pbonzini@redhat.com>
Date: Fri, 24 Jan 2025 14:11:09 -0500
From: Paolo Bonzini <pbonzini@...hat.com>
To: linux-kernel@...r.kernel.org,
	kvm@...r.kernel.org
Cc: seanjc@...gle.com
Subject: [PATCH 2/2] Documentation: explain issues with taking locks inside kvm_lock

kvm_lock should be used sparingly, and it is easy to protect
vm_list walks with kvm_get_kvm and kvm_put_kvm.  Make it
a hard rule to drop kvm_lock before taking another mutex,
and document it.

Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
---
 Documentation/virt/kvm/locking.rst | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index c56d5f26c750..f94aad9b95ab 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -26,13 +26,6 @@ The acquisition orders for mutexes are as follows:
   are taken on the waiting side when modifying memslots, so MMU notifiers
   must not take either kvm->slots_lock or kvm->slots_arch_lock.
 
-cpus_read_lock() vs kvm_lock:
-
-- Taking cpus_read_lock() outside of kvm_lock is problematic, despite that
-  being the official ordering, as it is quite easy to unknowingly trigger
-  cpus_read_lock() while holding kvm_lock.  Use caution when walking vm_list,
-  e.g. avoid complex operations when possible.
-
 For SRCU:
 
 - ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections
@@ -59,6 +52,23 @@ On x86:
 Everything else is a leaf: no other lock is taken inside the critical
 sections.
 
+In particular no other mutex should be taken inside kvm_lock, and the
+amount of code that can be run inside kvm_lock should be limited; this
+is because ``cpus_read_lock()`` might be triggered unknowingly and cause
+a circular dependency.  For example, if you take ``kvm->slots_lock``
+inside ``kvm_lock``, the following can happen on x86:
+
+- ``kvm->srcu`` is synchronized with ``kvm->slots_lock`` taken
+- you wait for ``kvm->slots_lock`` with ``kvm_lock`` taken
+- ``__kvmclock_cpufreq_notifier()`` waits for ``kvm_lock`` and
+  is called within ``cpus_read_lock()``.
+- ``KVM_RUN`` can trigger static key updates, which call ``cpus_read_lock()``,
+  with ``kvm->srcu`` taken
+- therefore ``synchronize_srcu(&kvm->srcu)`` never completes.
+
+This rule applies to all architectures.
+
+
 2. Exception
 ------------
 
@@ -238,6 +248,9 @@ time it will be set using the Dirty tracking mechanism described above.
 :Type:		mutex
 :Arch:		any
 :Protects:	- vm_list
+                - kvm_createvm_count
+                - kvm_active_vms
+:Comment:       Do not take any mutex inside.
 
 ``kvm_usage_lock``
 ^^^^^^^^^^^^^^^^^^
-- 
2.43.5


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ