[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <28cfc19ac3171c270896d080f30aeda11b587bb8.camel@intel.com>
Date: Wed, 30 Aug 2023 00:02:44 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"Lutomirski, Andy" <luto@...nel.org>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"thomas.lendacky@....com" <thomas.lendacky@....com>,
"haiyangz@...rosoft.com" <haiyangz@...rosoft.com>,
"kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
"Christopherson,, Sean" <seanjc@...gle.com>,
"mingo@...hat.com" <mingo@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"kys@...rosoft.com" <kys@...rosoft.com>,
"Cui, Dexuan" <decui@...rosoft.com>,
"mikelley@...rosoft.com" <mikelley@...rosoft.com>,
"hpa@...or.com" <hpa@...or.com>,
"peterz@...radead.org" <peterz@...radead.org>,
"wei.liu@...nel.org" <wei.liu@...nel.org>,
"bp@...en8.de" <bp@...en8.de>,
"sathyanarayanan.kuppuswamy@...ux.intel.com"
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 1/1] x86/mm: Mark CoCo VM pages invalid while moving
between private and shared
On Thu, 2023-07-06 at 09:41 -0700, Michael Kelley wrote:
> In a CoCo VM when a page transitions from private to shared, or vice
> versa, attributes in the PTE must be updated *and* the hypervisor
> must
> be notified of the change. Because there are two separate steps,
> there's
> a window where the settings are inconsistent. Normally the code that
> initiates the transition (via set_memory_decrypted() or
> set_memory_encrypted()) ensures that the memory is not being accessed
> during a transition, so the window of inconsistency is not a problem.
> However, the load_unaligned_zeropad() function can read arbitrary
> memory
> pages at arbitrary times, which could access a transitioning page
> during
> the window. In such a case, CoCo VM specific exceptions are taken
> (depending on the CoCo architecture in use). Current code in those
> exception handlers recovers and does "fixup" on the result returned
> by
> load_unaligned_zeropad(). Unfortunately, this exception handling and
> fixup code is tricky and somewhat fragile. At the moment, it is
> broken for both TDX and SEV-SNP.
>
> There's also a problem with the current code in paravisor scenarios:
> TDX Partitioning and SEV-SNP in vTOM mode. The exceptions need
> to be forwarded from the paravisor to the Linux guest, but there
> are no architectural specs for how to do that.
>
> To avoid these complexities of the CoCo exception handlers, change
> the core transition code in __set_memory_enc_pgtable() to do the
> following:
>
> 1. Remove aliasing mappings
> 2. Remove the PRESENT bit from the PTEs of all transitioning pages
> 3. Flush the TLB globally
> 4. Flush the data cache if needed
> 5. Set/clear the encryption attribute as appropriate
> 6. Notify the hypervisor of the page status change
> 7. Add back the PRESENT bit
>
> With this approach, load_unaligned_zeropad() just takes its normal
> page-fault-based fixup path if it touches a page that is
> transitioning.
> As a result, load_unaligned_zeropad() and CoCo VM page transitioning
> are completely decoupled. CoCo VM page transitions can proceed
> without needing to handle architecture-specific exceptions and fix
> things up. This decoupling reduces the complexity due to separate
> TDX and SEV-SNP fixup paths, and gives more freedom to revise and
> introduce new capabilities in future versions of the TDX and SEV-SNP
> architectures. Paravisor scenarios work properly without needing
> to forward exceptions.
>
> This approach may make __set_memory_enc_pgtable() slightly slower
> because of touching the PTEs three times instead of just once. But
> the run time of this function is already dominated by the hypercall
> and the need to flush the TLB at least once and maybe twice. In any
> case, this function is only used for CoCo VM page transitions, and
> is already unsuitable for hot paths.
Excluding vm_unmap_aliases(), and just looking at the TLB flushes, it
kind of looks like this:
1. Clear present
2. TLB flush
3. Set C bit
4. Set Present bit
5. TLB flush
But if you instead did:
1. Clear Present and set C bit
2. TLB flush
3. Set Present bit (no flush)
Then you could still have only 1 TLB flush and 2 operations instead of
3. Otherwise it's the same load_unaligned_zeropad() benefits you are
looking for I think. But I'm not very educated on the private<->shared
conversion HW rules though, so maybe not.
>
> The architecture specific callback function for notifying the
> hypervisor typically must translate guest kernel virtual addresses
> into guest physical addresses to pass to the hypervisor. Because
> the PTEs are invalid at the time of callback, the code for doing the
> translation needs updating. virt_to_phys() or equivalent continues
> to work for direct map addresses. But vmalloc addresses cannot use
> vmalloc_to_page() because that function requires the leaf PTE to be
> valid. Instead, slow_virt_to_phys() must be used. Both functions
> manually walk the page table hierarchy, so performance is the same.
Just curious. Are vmalloc addresses supported here? It looks like in
SEV, but not TDX.
Powered by blists - more mailing lists