[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BYAPR21MB168846628450D6089242D3EAD72CA@BYAPR21MB1688.namprd21.prod.outlook.com>
Date: Thu, 6 Jul 2023 16:48:32 +0000
From: "Michael Kelley (LINUX)" <mikelley@...rosoft.com>
To: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
"dave.hansen@...el.com" <dave.hansen@...el.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"bp@...en8.de" <bp@...en8.de>
CC: Dexuan Cui <decui@...rosoft.com>,
"rick.p.edgecombe@...el.com" <rick.p.edgecombe@...el.com>,
"sathyanarayanan.kuppuswamy@...ux.intel.com"
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
"seanjc@...gle.com" <seanjc@...gle.com>,
"thomas.lendacky@....com" <thomas.lendacky@....com>,
"x86@...nel.org" <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCHv3 0/3] x86/tdx: Fix one more load_unaligned_zeropad()
issue
From: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com> Sent: Tuesday, June 6, 2023 2:56 AM
>
> During review of TDX guests on Hyper-V patchset Dave pointed to the
> potential race between changing page private/shared status and
> load_unaligned_zeropad().
>
> Fix the issue.
>
> v3:
> - Fix grammar;
> - Add Sathya's Reviewed-bys;
> v2:
> - Add more info in commit message of the first patch.
> - Move enc_status_change_finish_noop() into a separate patch.
> - Fix typo in commit message and comment.
>
> Kirill A. Shutemov (3):
> x86/mm: Allow guest.enc_status_change_prepare() to fail
> x86/tdx: Fix race between set_memory_encrypted() and
> load_unaligned_zeropad()
> x86/mm: Fix enc_status_change_finish_noop()
>
> arch/x86/coco/tdx/tdx.c | 64 +++++++++++++++++++++++++++++++--
> arch/x86/include/asm/x86_init.h | 2 +-
> arch/x86/kernel/x86_init.c | 4 +--
> arch/x86/mm/mem_encrypt_amd.c | 4 ++-
> arch/x86/mm/pat/set_memory.c | 3 +-
> 5 files changed, 69 insertions(+), 8 deletions(-)
>
> --
> 2.39.3
These fixes notwithstanding, load_unaligned_zeropad() is not handled
properly in a TDX VM. The problem was introduced with commit
c4e34dd99f2e, which moved the fixup code to function
ex_handler_zeropad(). This new function does a verification against
fault_addr, and the verification always fails because fault_addr is zero.
The call sequence is:
exc_virtualization_exception()
ve_raise_fault()
gp_try_fixup_and_notify() <-- always passes 0 as fault_addr
fixup_exception()
ex_handler_zeropad()
The validation of fault_addr could probably be removed since
such validation wasn't there prior to c4e34dd99f2e. But before
going down that path, I want to propose a different top-level
solution to the interaction between load_unaligned_zeropad()
and CoCo VM page transitions between private and shared.
When a page is transitioning, the caller can and should ensure
that it is not being accessed during the transition. But we have
the load_unaligned_zeropad() wildcard. So do the following for
the transition sequence in __set_memory_enc_pgtable():
1. Remove aliasing mappings
2. Remove the PRESENT bit from the PTEs of all transitioning pages
3. Flush the TLB globally
4. Flush the data cache if needed
5. Set/clear the encryption attribute as appropriate
6. Notify the hypervisor of the page status change
7. Add back the PRESENT bit
With this approach, load_unaligned_zeropad() just takes the
normal page-fault-based fixup path if it touches a page that is
transitioning. As a result, load_unaligned_zeropad() and CoCo
VM page transitioning are completely decoupled. We don't
need to handle architecture-specific CoCo VM exceptions and
fix things up.
I've posted an RFC PATCH that implements this approach [1],
and tested on TDX VMs and SEV-SNP VMs in vTOM mode.
The RFC PATCH has more details on the benefits and
implications. Follow-up discussion should probably be done
on that email thread.
Michael
[1] https://lore.kernel.org/lkml/1688661719-60329-1-git-send-email-mikelley@microsoft.com/T/#u
Powered by blists - more mailing lists