[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210531200712.qjxghakcaj4s6ara@box.shutemov.name>
Date: Mon, 31 May 2021 23:07:12 +0300
From: "Kirill A. Shutemov" <kirill@...temov.name>
To: Sean Christopherson <seanjc@...gle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Andy Lutomirski <luto@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Jim Mattson <jmattson@...gle.com>,
David Rientjes <rientjes@...gle.com>,
"Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
"Kleen, Andi" <andi.kleen@...el.com>,
"Yamahata, Isaku" <isaku.yamahata@...el.com>,
Erdem Aktas <erdemaktas@...gle.com>,
Steve Rutherford <srutherford@...gle.com>,
Peter Gonda <pgonda@...gle.com>,
David Hildenbrand <david@...hat.com>,
Chao Peng <chao.p.peng@...ux.intel.com>, x86@...nel.org,
kvm@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFCv2 13/13] KVM: unmap guest memory using poisoned pages
On Wed, May 26, 2021 at 07:46:52PM +0000, Sean Christopherson wrote:
> On Fri, May 21, 2021, Kirill A. Shutemov wrote:
> > Hi Sean,
> >
> > The core patch of the approach we've discussed before is below. It
> > introduce a new page type with the required semantics.
> >
> > The full patchset can be found here:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git kvm-unmapped-guest-only
> >
> > but only the patch below is relevant for TDX. QEMU patch is attached.
>
> Can you post the whole series?
I hoped to get it posted as part of TDX host enabling.
As it is the feature is incomplete for pure KVM. I didn't implement on KVM
side checks that provided by TDX module/hardware, so nothing prevents the
same page to be added to multiple KVM instances.
> The KVM behavior and usage of FOLL_GUEST is very relevant to TDX.
The patch can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=kvm-unmapped-guest-only&id=2cd6c2c20528696a46a2a59383ca81638bf856b5
> > CONFIG_HAVE_KVM_PROTECTED_MEMORY has to be changed to what is appropriate
> > for TDX and FOLL_GUEST has to be used in hva_to_pfn_slow() when running
> > TDX guest.
>
> This behavior in particular is relevant; KVM should provide FOLL_GUEST iff the
> access is private or the VM type doesn't differentiate between private and
> shared.
I added FOL_GUEST if the KVM instance has the feature enabled.
On top of that TDX-specific code has to check that the page is in fact
PageGuest() before inserting it into private SEPT.
The scheme makes sure that user-accessible memory cannot be not added as
private to TD.
> > When page get inserted into private sept we must make sure it is
> > PageGuest() or SIGBUS otherwise.
>
> More KVM feedback :-)
>
> Ideally, KVM will synchronously exit to userspace with detailed information on
> the bad behavior, not do SIGBUS. Hopefully that infrastructure will be in place
> sooner than later.
>
> https://lkml.kernel.org/r/YKxJLcg/WomPE422@google.com
My experiments are still v5.11, but I can rebase to whatever needed once
the infrastructure hits upstream.
> > Inserting PageGuest() into shared is fine, but the page will not be accessible
> > from userspace.
>
> Even if it can be functionally fine, I don't think we want to allow KVM to map
> PageGuest() as shared memory. The only reason to map memory shared is to share
> it with something, e.g. the host, that doesn't have access to private memory, so
> I can't envision a use case.
>
> On the KVM side, it's trivially easy to omit FOLL_GUEST for shared memory, while
> always passing FOLL_GUEST would require manually zapping. Manual zapping isn't
> a big deal, but I do think it can be avoided if userspace must either remap the
> hva or define a new KVM memslot (new gpa->hva), both of which will automatically
> zap any existing translations.
>
> Aha, thought of a concrete problem. If KVM maps PageGuest() into shared memory,
> then KVM must ensure that the page is not mapped private via a different hva/gpa,
> and is not mapped _any_ other guest because the TDX-Module's 1:1 PFN:TD+GPA
> enforcement only applies to private memory. The explicit "VM_WRITE | VM_SHARED"
> requirement below makes me think this wouldn't be prevented.
Hm. I didn't realize that TDX module doesn't prevent the same page to be
used as shared and private at the same time.
Omitting FOLL_GUEST for shared memory doesn't look like a right approach.
IIUC, it would require the kernel to track what memory is share and what
private, which defeat the purpose of the rework. I would rather enforce
!PageGuest() when share SEPT is populated in addition to enforcing
PageGuest() fro private SEPT.
Do you see any problems with this?
> Oh, and the other nicety is that I think it would avoid having to explicitly
> handle PageGuest() memory that is being accessed from kernel/KVM, i.e. if all
> memory exposed to KVM must be !PageGuest(), then it is also eligible for
> copy_{to,from}_user().
copy_{to,from}_user() enforce by setting PTE entries to PROT_NONE.
Or do I miss your point?
>
> > Any feedback is welcome.
> >
> > -------------------------------8<-------------------------------------------
> >
> > From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
> > Date: Fri, 16 Apr 2021 01:30:48 +0300
> > Subject: [PATCH] mm: Introduce guest-only pages
> >
> > PageGuest() pages are only allowed to be used as guest memory. Userspace
> > is not allowed read from or write to such pages.
> >
> > On page fault, PageGuest() pages produce PROT_NONE page table entries.
> > Read or write there will trigger SIGBUS. Access to such pages via
> > syscall leads to -EIO.
> >
> > The new mprotect(2) flag PROT_GUEST translates to VM_GUEST. Any page
> > fault to VM_GUEST VMA produces PageGuest() page.
> >
> > Only shared tmpfs/shmem mappings are supported.
>
> Is limiting this to tmpfs/shmem only for the PoC/RFC, or is it also expected to
> be the long-term behavior?
I expect it to be enough to cover all relevant cases, no?
Note that MAP_ANONYMOUS|MAP_SHARED also fits here.
--
Kirill A. Shutemov
Powered by blists - more mailing lists