lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 2 Oct 2023 10:42:35 -0500
From:   Tom Lendacky <thomas.lendacky@....com>
To:     Michael Kelley <mikelley@...rosoft.com>, kys@...rosoft.com,
        haiyangz@...rosoft.com, wei.liu@...nel.org, decui@...rosoft.com,
        tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com, luto@...nel.org,
        peterz@...radead.org, sathyanarayanan.kuppuswamy@...ux.intel.com,
        kirill.shutemov@...ux.intel.com, seanjc@...gle.com,
        rick.p.edgecombe@...el.com, linux-kernel@...r.kernel.org,
        linux-hyperv@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH 0/5] x86/coco: Mark CoCo VM pages not present when
 changing encrypted state

On 9/29/23 13:19, Michael Kelley wrote:
> In a CoCo VM when a page transitions from encrypted to decrypted, or vice
> versa, attributes in the PTE must be updated *and* the hypervisor must
> be notified of the change. Because there are two separate steps, there's
> a window where the settings are inconsistent.  Normally the code that
> initiates the transition (via set_memory_decrypted() or
> set_memory_encrypted()) ensures that the memory is not being accessed
> during a transition, so the window of inconsistency is not a problem.
> However, load_unaligned_zeropad() can read arbitrary memory pages at
> arbitrary times, which could read a transitioning page during the
> window.  In such a case, CoCo VM specific exceptions are taken
> (depending on the CoCo architecture in use).  Current code in those
> exception handlers recovers and does "fixup" on the result returned by
> load_unaligned_zeropad().  Unfortunately, this exception handling can't
> work in paravisor scenarios (TDX Paritioning and SEV-SNP in vTOM mode).
> The exceptions would need to be forwarded from the paravisor to the
> Linux guest, but there's no architectural spec for how to do that.
> 
> Fortunately, there's a simpler way to solve the problem by changing
> the core transition code in __set_memory_enc_pgtable() to do the
> following:
> 
> 1.  Remove aliasing mappings
> 2.  Flush the data cache if needed
> 3.  Remove the PRESENT bit from the PTEs of all transitioning pages
> 4.  Set/clear the encryption attribute as appropriate
> 5.  Flush the TLB so the changed encryption attribute isn't visible
> 6.  Notify the hypervisor of the encryption status change
> 7.  Add back the PRESENT bit, making the changed attribute visible
> 
> With this approach, load_unaligned_zeropad() just takes its normal
> page-fault-based fixup path if it touches a page that is transitioning.
> As a result, load_unaligned_zeropad() and CoCo VM page transitioning
> are completely decoupled.  CoCo VM page transitions can proceed
> without needing to handle architecture-specific exceptions and fix
> things up. This decoupling reduces the complexity due to separate
> TDX and SEV-SNP fixup paths, and gives more freedom to revise and
> introduce new capabilities in future versions of the TDX and SEV-SNP
> architectures. Paravisor scenarios work properly without needing
> to forward exceptions.
> 
> This patch set is follow-up to an RFC patch and discussion.[1]
> Compared with the RFC patch, the steps listed above are optimized for
> better performance and particularly for fewer TLB flushes.
> 
> Patch 1 handles implications of the hypervisor callbacks in Step 6
> needing to do virt-to-phys translations on pages that are temporarily
> marked not present.
> 
> Patch 2 is a performance optimization so that Step 7 doesn't generate
> a TLB flush.
> 
> Patch 3 is the core change that implements Steps 1 thru 7. It also
> simplifies the associated TDX, SEV-SNP, and Hyper-V vTOM callbacks.
> 
> Patch 4 is a somewhat tangential cleanup that removes an unnecessary
> wrapper function in the path for doing a transition.
> 
> Patch 5 adds comments describing the implications of errors when
> doing a transition.  These implications are discussed in the email
> thread for the RFC patch.
> 
> With this change, the #VE and #VC exception handlers should no longer
> be triggered for load_unaligned_zeropad() accesses, and the existing
> code in those handlers to do the "fixup" shouldn't be needed. But I
> have not removed that code in this patch set. Kirill Shutemov wants
> to keep the code for TDX #VE, so the code for #VC on the the SEV-SNP
> side has also been kept.

There isn't any code for the SEV-SNP side at the moment.

Thanks,
Tom

> 
> This patch set is based on the linux-next20230921 code tree.
> 
> [1] https://lore.kernel.org/lkml/1688661719-60329-1-git-send-email-mikelley@microsoft.com/
> 
> Michael Kelley (5):
>    x86/coco: Use slow_virt_to_phys() in page transition hypervisor
>      callbacks
>    x86/mm: Don't do a TLB flush if changing a PTE that isn't marked
>      present
>    x86/mm: Mark CoCo VM pages not present while changing encrypted state
>    x86/mm: Remove unnecessary call layer for __set_memory_enc_pgtable()
>    x86/mm: Add comments about errors in
>      set_memory_decrypted()/encrypted()
> 
>   arch/x86/coco/tdx/tdx.c       |  66 +-----------------------
>   arch/x86/hyperv/ivm.c         |  15 +++---
>   arch/x86/kernel/sev.c         |   8 ++-
>   arch/x86/kernel/x86_init.c    |   4 --
>   arch/x86/mm/mem_encrypt_amd.c |  27 +++-------
>   arch/x86/mm/pat/set_memory.c  | 114 +++++++++++++++++++++++++++++-------------
>   6 files changed, 102 insertions(+), 132 deletions(-)
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ