lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b40f244f50ce3a14d637fd1769a9b3f709b0842e.camel@infradead.org>
Date: Fri, 02 Aug 2024 13:03:16 +0100
From: David Woodhouse <dwmw2@...radead.org>
To: Carsten Stollmaier <stollmc@...zon.com>, Sean Christopherson
 <seanjc@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>, Thomas Gleixner
 <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov
 <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
 "H. Peter Anvin" <hpa@...or.com>
Cc: nh-open-source@...zon.com, Peter Xu <peterx@...hat.com>, Sebastian
 Biemueller <sbiemue@...zon.de>, kvm@...r.kernel.org,
 linux-kernel@...r.kernel.org, Matthew Wilcox <willy@...radead.org>, Andrew
 Morton <akpm@...ux-foundation.org>,  "linux-mm@...ck.org"
 <linux-mm@...ck.org>
Subject: Re: [PATCH] KVM: x86: Use gfn_to_pfn_cache for steal_time

On Fri, 2024-08-02 at 11:44 +0000, Carsten Stollmaier wrote:
> On vcpu_run, before entering the guest, the update of the steal time
> information causes a page-fault if the page is not present. In our
> scenario, this gets handled by do_user_addr_fault and successively
> handle_userfault since we have the region registered to that.
> 
> handle_userfault uses TASK_INTERRUPTIBLE, so it is interruptible by
> signals. do_user_addr_fault then busy-retries it if the pending signal
> is non-fatal. This leads to contention of the mmap_lock.

The busy-loop causes so much contention on mmap_lock that post-copy
live migration fails to make progress, and is leading to failures. Yes?

> This patch replaces the use of gfn_to_hva_cache with gfn_to_pfn_cache,
> as gfn_to_pfn_cache ensures page presence for the memory access,
> preventing the contention of the mmap_lock.
> 
> Signed-off-by: Carsten Stollmaier <stollmc@...zon.com>

Reviewed-by: David Woodhouse <dwmw@...zon.co.uk>

I think this makes sense on its own, as it addresses the specific case
where KVM is *likely* to be touching a userfaulted (guest) page. And it
allows us to ditch yet another explicit asm exception handler.

We should note, though, that in terms of the original problem described
above, it's a bit of a workaround. It just means that by using
kvm_gpc_refresh() to obtain the user page, we end up in
handle_userfault() without the FAULT_FLAG_INTERRUPTIBLE flag.

(Note to self: should kvm_gpc_refresh() take fault flags, to allow
interruptible and killable modes to be selected by its caller?)


An alternative workaround (which perhaps we should *also* consider)
looked like this (plus some suitable code comment, of course):

--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1304,6 +1304,8 @@ void do_user_addr_fault(struct pt_regs *regs,
         */
        if (user_mode(regs))
                flags |= FAULT_FLAG_USER;
+       else
+               flags &= ~FAULT_FLAG_INTERRUPTIBLE;
 
 #ifdef CONFIG_X86_64
        /*


That would *also* handle arbitrary copy_to_user/copy_from_user() to
userfault pages, which could theoretically hit the same busy loop.

I'm actually tempted to make user access *interruptible* though, and
either add copy_{from,to}_user_interruptible() or change the semantics
of the existing ones (which I believe are already killable).

That would require each architecture implementing interruptible
exceptions, by doing an extable lookup before the retry. Not overly
complex, but needs to be done for all architectures (although not at
once; we could live with not-yet-done architectures just remaining
killable).

Thoughts?


Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ