linux-kernel - Re: [PATCH RFC 03/39] KVM: x86/xen: register shared

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8c92b2f3a8e8829ec85d22091b2fe84794f12f78.camel@infradead.org>
Date:   Thu, 03 Dec 2020 10:16:06 +0000
From:   David Woodhouse <dwmw2@...radead.org>
To:     Ankur Arora <ankur.a.arora@...cle.com>,
        Joao Martins <joao.m.martins@...cle.com>
Cc:     Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page

On Wed, 2020-12-02 at 12:32 -0800, Ankur Arora wrote:
> > On IRC, Paolo told me that permanent pinning causes problems for memory
> > hotplug, and pointed me at the trick we do with an MMU notifier and
> > kvm_vcpu_reload_apic_access_page().
> 
> Okay that answers my question. Thanks for clearing that up.
> 
> Not sure of a good place to document this but it would be good to
> have this written down somewhere. Maybe kvm_map_gfn()?

Trying not to get too distracted by polishing this part, so I can
continue with making more things actually work. But I took a quick look
at the reload_apic_access_page() thing.

AFAICT it works because the access is only from *within* the vCPU, in
guest mode.

So all the notifier has to do is kick all CPUs, which happens when it
calls kvm_make_all_cpus_request(). Thus we are guaranteed that all CPUs
are *out* of guest mode by the time...

    ...er... maybe not by the time the notifier returns, because all 
    we've done is *send* the IPI and we don't know the other CPUs have 
    actually stopped running the guest yet? 

    Maybe there's some explanation of why the actual TLB shootdown 
    truly *will* occur before the page goes away, and some ordering 
    rules which mean our reschedule IPI will happen first? Something 
    like that ideally would have been in a comment in in MMU notifier.

Anyway, I'm not going to fret about that because it clearly doesn't
work for the pvclock/shinfo cases anyway; those are accessed from the
kernel *outside* guest mode. I think we can use a similar trick but
just protect the accesses with kvm->srcu.

The code in my tree now uses kvm_map_gfn() at setup time and leaves the
shinfo page mapped. A bit like the KVM pvclock code, except that I
don't pointlessly map and unmap all the time while leaving the page
pinned anyway.

So all we need to do is kvm_release_pfn() right after mapping it,
leaving the map->hva "unsafe".

Then we hook in to the MMU notifier to unmap it when it goes away (in
RCU style; clearing the pointer variable, synchronize_srcu(), and
*then* unmap, possibly in different range_start/range/range_end
callbacks).

The *users* of such mappings will need an RCU read lock, and will need
to be able to fault the GFN back in when they need it.

For now, though, I'm content that converting the Xen shinfo support to
kvm_map_gfn() is the right way to go, and the memory hotplug support is
an incremental addition on top of it.

And I'm even more comforted by the observation that the KVM pvclock
support *already* pins its pages, so I'm not even failing to meet the
existing bar :)

Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5174 bytes)