lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9b388f54f13b34fe684ef77603fc878952e48f87.camel@intel.com>
Date:   Thu, 28 Apr 2022 12:37:29 +1200
From:   Kai Huang <kai.huang@...el.com>
To:     Dave Hansen <dave.hansen@...el.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org
Cc:     seanjc@...gle.com, pbonzini@...hat.com, len.brown@...el.com,
        tony.luck@...el.com, rafael.j.wysocki@...el.com,
        reinette.chatre@...el.com, dan.j.williams@...el.com,
        peterz@...radead.org, ak@...ux.intel.com,
        kirill.shutemov@...ux.intel.com,
        sathyanarayanan.kuppuswamy@...ux.intel.com,
        isaku.yamahata@...el.com
Subject: Re: [PATCH v3 00/21] TDX host kernel support

On Wed, 2022-04-27 at 14:59 -0700, Dave Hansen wrote:
> On 4/26/22 18:15, Kai Huang wrote:
> > On Tue, 2022-04-26 at 13:13 -0700, Dave Hansen wrote:
> > > On 4/5/22 21:49, Kai Huang wrote:
> > > > SEAM VMX root operation is designed to host a CPU-attested, software
> > > > module called the 'TDX module' which implements functions to manage
> > > > crypto protected VMs called Trust Domains (TD).  SEAM VMX root is also
> > > 
> > > "crypto protected"?  What the heck is that?
> > 
> > How about "crypto-protected"?  I googled and it seems it is used by someone
> > else.
> 
> Cryptography itself doesn't provide (much) protection in the TDX
> architecture.  TDX guests are isolated from the VMM in ways that
> traditional guests are not, but that has almost nothing to do with
> cryptography.
> 
> Is it cryptography that keeps the host from reading guest private data
> in the clear?  Is it cryptography that keeps the host from reading guest
> ciphertext?  Does cryptography enforce the extra rules of Secure-EPT?

OK will change to "protected VMs" in this entire series.

> 
> > > > 3. Memory hotplug
> > > > 
> > > > The first generation of TDX architecturally doesn't support memory
> > > > hotplug.  And the first generation of TDX-capable platforms don't support
> > > > physical memory hotplug.  Since it physically cannot happen, this series
> > > > doesn't add any check in ACPI memory hotplug code path to disable it.
> > > > 
> > > > A special case of memory hotplug is adding NVDIMM as system RAM using
> > > > kmem driver.  However the first generation of TDX-capable platforms
> > > > cannot enable TDX and NVDIMM simultaneously, so in practice this cannot
> > > > happen either.
> > > 
> > > What prevents this code from today's code being run on tomorrow's
> > > platforms and breaking these assumptions?
> > 
> > I forgot to add below (which is in the documentation patch):
> > 
> > "This can be enhanced when future generation of TDX starts to support ACPI
> > memory hotplug, or NVDIMM and TDX can be enabled simultaneously on the
> > same platform."
> > 
> > Is this acceptable?
> 
> No, Kai.
> 
> You're basically saying: *this* code doesn't work with feature A, B and
> C.  Then, you're pivoting to say that it doesn't matter because one
> version of Intel's hardware doesn't support A, B, or C.
> 
> I don't care about this *ONE* version of the hardware.  I care about
> *ALL* the hardware that this code will ever support.  *ALL* the hardware
> on which this code will run.
> 
> In 5 years, if someone takes this code and runs it on Intel hardware
> with memory hotplug, CPU hotplug, NVDIMMs *AND* TDX support, what happens?

I thought we could document this in the documentation saying that this code can
only work on TDX machines that don't have above capabilities (SPR for now).  We
can change the code and the documentation  when we add the support of those
features in the future, and update the documentation.

If 5 years later someone takes this code, he/she should take a look at the
documentation and figure out that he/she should choose a newer kernel if the
machine support those features.

I'll think about design solutions if above doesn't look good for you.

> 
> You can't just ignore the problems because they're not present on one
> version of the hardware.
> 
> > > > Another case is admin can use 'memmap' kernel command line to create
> > > > legacy PMEMs and use them as TD guest memory, or theoretically, can use
> > > > kmem driver to add them as system RAM.  To avoid having to change memory
> > > > hotplug code to prevent this from happening, this series always include
> > > > legacy PMEMs when constructing TDMRs so they are also TDX memory.
> > > > 
> > > > 4. CPU hotplug
> > > > 
> > > > The first generation of TDX architecturally doesn't support ACPI CPU
> > > > hotplug.  All logical cpus are enabled by BIOS in MADT table.  Also, the
> > > > first generation of TDX-capable platforms don't support ACPI CPU hotplug
> > > > either.  Since this physically cannot happen, this series doesn't add any
> > > > check in ACPI CPU hotplug code path to disable it.
> > > > 
> > > > Also, only TDX module initialization requires all BIOS-enabled cpus are
> > > > online.  After the initialization, any logical cpu can be brought down
> > > > and brought up to online again later.  Therefore this series doesn't
> > > > change logical CPU hotplug either.
> > > > 
> > > > 5. TDX interaction with kexec()
> > > > 
> > > > If TDX is ever enabled and/or used to run any TD guests, the cachelines
> > > > of TDX private memory, including PAMTs, used by TDX module need to be
> > > > flushed before transiting to the new kernel otherwise they may silently
> > > > corrupt the new kernel.  Similar to SME, this series flushes cache in
> > > > stop_this_cpu().
> > > 
> > > What does this have to do with kexec()?  What's a PAMT?
> > 
> > The point is the dirty cachelines of TDX private memory must be flushed
> > otherwise they may slightly corrupt the new kexec()-ed kernel.
> > 
> > Will use "TDX metadata" instead of "PAMT".  The former has already been
> > mentioned above.
> 
> Longer description for the patch itself:
> 
> TDX memory encryption is built on top of MKTME which uses physical
> address aliases to designate encryption keys.  This architecture is not
> cache coherent.  Software is responsible for flushing the CPU caches
> when memory changes keys.  When kexec()'ing, memory can be repurposed
> from TDX use to non-TDX use, changing the effective encryption key.
> 
> Cover-letter-level description:
> 
> Just like SME, TDX hosts require special cache flushing before kexec().

Thanks.

> 
> > > > uninitialized state so it can be initialized again.
> > > > 
> > > > This implies:
> > > > 
> > > >   - If the old kernel fails to initialize TDX, the new kernel cannot
> > > >     use TDX too unless the new kernel fixes the bug which leads to
> > > >     initialization failure in the old kernel and can resume from where
> > > >     the old kernel stops. This requires certain coordination between
> > > >     the two kernels.
> > > 
> > > OK, but what does this *MEAN*?
> > 
> > This means we need to extend the information which the old kernel passes to the
> > new kernel.  But I don't think it's feasible.  I'll refine this kexec() section
> > to make it more concise next version.
> > 
> > > 
> > > >   - If the old kernel has initialized TDX successfully, the new kernel
> > > >     may be able to use TDX if the two kernels have the exactly same
> > > >     configurations on the TDX module. It further requires the new kernel
> > > >     to reserve the TDX metadata pages (allocated by the old kernel) in
> > > >     its page allocator. It also requires coordination between the two
> > > >     kernels.  Furthermore, if kexec() is done when there are active TD
> > > >     guests running, the new kernel cannot use TDX because it's extremely
> > > >     hard for the old kernel to pass all TDX private pages to the new
> > > >     kernel.
> > > > 
> > > > Given that, this series doesn't support TDX after kexec() (except the
> > > > old kernel doesn't attempt to initialize TDX at all).
> > > > 
> > > > And this series doesn't shut down TDX module but leaves it open during
> > > > kexec().  It is because shutting down TDX module requires CPU being in
> > > > VMX operation but there's no guarantee of this during kexec().  Leaving
> > > > the TDX module open is not the best case, but it is OK since the new
> > > > kernel won't be able to use TDX anyway (therefore TDX module won't run
> > > > at all).
> > > 
> > > tl;dr: kexec() doesn't work with this code.
> > > 
> > > Right?
> > > 
> > > That doesn't seem good.
> > 
> > It can work in my understanding.  We just need to flush cache before booting to
> > the new kernel.
> 
> What about all the concerns about TDX module configuration changing?
> 

Leaving the TDX module in fully initialized state or shutdown state (in case of
error during it's initialization) to the new kernel is fine.  If the new kernel
doesn't use TDX at all, then the TDX module won't access memory using it's
global TDX KeyID.  If the new kernel wants to use TDX, it will fail on the very
first SEAMCALL when it tries to initialize the TDX module, and won't use
SEAMCALL to call the TDX module again.  If the new kernel doesn't follow this,
then it is a bug in the new kernel, or the new kernel is malicious, in which
case it can potentially corrupt the data.  But I don't think we need to consider
this as if the new kernel is malicious, then it can corrupt data anyway.

Does this make sense?

Is there any other concerns that I missed? 

-- 
Thanks,
-Kai


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ