linux-kernel - Re: [PATCHv3 0/3] x86/tdx: Fix one more load_unaligned

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230725231233.mu2yarso3tcamimu@box.shutemov.name>
Date:   Wed, 26 Jul 2023 02:12:33 +0300
From:   "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
To:     "Michael Kelley (LINUX)" <mikelley@...rosoft.com>
Cc:     "Kirill A. Shutemov" <kirill@...temov.name>,
        "dave.hansen@...el.com" <dave.hansen@...el.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>, Dexuan Cui <decui@...rosoft.com>,
        "rick.p.edgecombe@...el.com" <rick.p.edgecombe@...el.com>,
        "sathyanarayanan.kuppuswamy@...ux.intel.com" 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        "seanjc@...gle.com" <seanjc@...gle.com>,
        "thomas.lendacky@....com" <thomas.lendacky@....com>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCHv3 0/3] x86/tdx: Fix one more load_unaligned_zeropad()
 issue

On Tue, Jul 25, 2023 at 03:51:24PM +0000, Michael Kelley (LINUX) wrote:
> From: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com> Sent: Monday, July 24, 2023 4:19 PM
> > 
> > On Thu, Jul 13, 2023 at 02:43:39PM +0000, Michael Kelley (LINUX) wrote:
> > > From: Kirill A. Shutemov <kirill@...temov.name> Sent: Saturday, July 8, 2023 11:09 PM
> > > >
> > > > On Sat, Jul 08, 2023 at 11:53:08PM +0000, Michael Kelley (LINUX) wrote:
> > > > > From: Kirill A. Shutemov <kirill@...temov.name> Sent: Friday, July 7, 2023 7:07 AM
> > > > > >
> > > > > > On Thu, Jul 06, 2023 at 04:48:32PM +0000, Michael Kelley (LINUX) wrote:
> > > > > > > From: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com> Sent: Tuesday, June 6, 2023 2:56 AM
> > > > >
> > > > > [snip]
> > > > >
> > > > > >
> > > > > > It only addresses the problem that happens on transition, but
> > > > > > load_unaligned_zeropad() is still a problem for the shared mappings in
> > > > > > general, after transition is complete. Like if load_unaligned_zeropad()
> > > > > > steps from private to shared mapping and shared mapping triggers #VE,
> > > > > > kernel should be able to handle it.
> > > > >
> > > > > I'm showing my ignorance of TDX architectural details, but what's the
> > > > > situation where shared mappings in general can trigger a #VE?  How
> > > > > do such situations get handled for references that aren't from
> > > > > load_unaligned_zeropad()?
> > > > >
> > > >
> > > > Shared mappings are under host/VMM control. It can just not map the page
> > > > in shared-ept and trigger ept-violation #VE.
> > >
> > > I know you are out on vacation, but let me follow up now for further
> > > discussion when you are back.
> > >
> > > Isn't the scenario you are describing a malfunctioning or malicious
> > > host/VMM?  Would what you are describing be done as part of normal
> > > operation?   Kernel code must have switched the page from private to
> > > shared for some purpose.  As soon as that code (which presumably
> > > does not have any entry in the exception table) touches the page, it
> > > would take the #VE and the enter the die path because there's no fixup.
> > > So is there value in having load_unaligned_zeropad() handle the #VE and
> > > succeed where a normal reference would fail?
> > 
> > #VE on shared memory is legitimately used for MMIO. But MMIO region is
> > usually separate from the real memory in physical address space.
> > 
> > But we also have DMA.
> > 
> > DMA pages allocated from common pool of memory and they can be next to
> > dentry cache that kernel accesses with load_unaligned_zeropad(). DMA pages
> > are shared, but they usually backed by memory and not cause #VE. However
> > shared memory is under full control from VMM and VMM can remove page at
> > any point which would make platform to deliver #VE to the guest. This is
> > pathological scenario, but I think it still worth handling gracefully.
> 
> Yep, pages targeted by DMA have been transitioned to shared, and could
> be scattered anywhere in the guest physical address space.  Such a page
> could be touched by load_unaligned_zeropad().  But could you elaborate
> more on the "pathological scenario" where the guest physical page isn't
> backed by memory?
> 
> Sure, the VMM can remove the page at any point, but why would it do
> so?  Is doing so a legitimate part of the host/guest contract that the guest
> must handle cleanly?  More importantly, what is the guest expected to
> do in such a case?  I would expect that at some point such a DMA page
> is accessed by a guest vCPU with an explicit reference that is not
> load_unaligned_zeropad().  Then the guest would take a #VE that
> goes down the #GP path and invokes die().
> 
> I don't object to make the load_unaligned_zeropad() fixup path work
> correctly in response to a #VE, as it seems like general goodness.  I'm
> just trying to make sure I understand the nuances of "why".  :-)

We actually triggered the issue during internal testing. I wrote initial
patch after that. But looking back on the report I don't have an answer
why the page triggered #VE. Maybe VMM or virtual BIOS screwed up and put
MMIO next to normal memory, I donno.

> > > I'd still like to see the private <-> shared transition code mark the pages
> > > as invalid during the transition, and avoid the possibility of #VE and
> > > similar cases with SEV-SNP.  Such approach reduces (eliminates?)
> > > entanglement between CoCo-specific exceptions and
> > > load_unaligned_zeropad().  It also greatly simplifies TD Partition cases
> > > and SEV-SNP cases where a paravisor is used.
> > 
> > I doesn't eliminates issue for TDX as the scenario above is not transient.
> > It can happen after memory is converted to shared.
> 
> Notwithstanding, do you have any objections to the private <-> shared
> transition code being changed so it won't be the cause of #VE, and
> similar on SEV-SNP?

I am not yet convinced it is needed. But let's see the code.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov