linux-kernel - Re: [PATCHv5 11/30] x86/tdx: Handle in-kernel MMIO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220311171848.g5wobw3rmi4e2zkd@black.fi.intel.com>
Date:   Fri, 11 Mar 2022 20:18:48 +0300
From:   "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
        luto@...nel.org, peterz@...radead.org,
        sathyanarayanan.kuppuswamy@...ux.intel.com, aarcange@...hat.com,
        ak@...ux.intel.com, dan.j.williams@...el.com, david@...hat.com,
        hpa@...or.com, jgross@...e.com, jmattson@...gle.com,
        joro@...tes.org, jpoimboe@...hat.com, knsathya@...nel.org,
        pbonzini@...hat.com, sdeep@...are.com, seanjc@...gle.com,
        tony.luck@...el.com, vkuznets@...hat.com, wanpengli@...cent.com,
        thomas.lendacky@....com, brijesh.singh@....com, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCHv5 11/30] x86/tdx: Handle in-kernel MMIO

On Thu, Mar 10, 2022 at 09:53:01AM -0800, Dave Hansen wrote:
> On 3/10/22 08:48, Kirill A. Shutemov wrote:
> I think this is good enough:
> 
>    - All guest code is expected to be in TD-private memory.  Being
>      private to the TD, VMMs have no way to access TD-private memory and
>      no way to read the instruction to decode and emulate it.

Looks good.

One remark: executing from shared memory (or walking page tables in shared
memory) triggers #PF.

> 
> We don't have to rehash what private memory is or how it is implemented.
> 
> >> This problem was laid out as having three cases:
> >> 1. virtio
> >> 2. x86-specific drivers
> >> 3. random drivers (everything else)
> >>
> >> #1 could be done with paravirt
> >> #2 is unspecified and unknown
> >> #3 use doesn't as far as I know exist in TDX guests today
> > 
> > #2 doesn't matter from performance point of view and there is no
> > convenient place where they can be intercepted as they are scattered
> > across the tree. Patching them doesn't bring any benefit, only pain.
> 
> I'd feel a lot better if this was slightly better specified.  Even
> booting with a:
> 
> 	printf("rip: %lx\n", regs->rip);
> 
> in the #VE handler would give some hard data about these.  This still
> feels to me like something that Sean got booting two years ago and
> nobody has really reconsidered.

Here the list I see on boot. It highly depends on QEMU setup. Any form of
device filtering will cut the further.

MMIO: ahci_enable_ahci
MMIO: ahci_freeze
MMIO: ahci_init_controller
MMIO: ahci_port_resume
MMIO: ahci_postreset
MMIO: ahci_reset_controller
MMIO: ahci_save_initial_config
MMIO: ahci_scr_read
MMIO: ahci_scr_write
MMIO: ahci_set_em_messages
MMIO: ahci_start_engine
MMIO: ahci_stop_engine
MMIO: ahci_thaw
MMIO: ioapic_set_affinity
MMIO: ioread16
MMIO: ioread32
MMIO: ioread8
MMIO: iowrite16
MMIO: iowrite32
MMIO: iowrite8
MMIO: mask_ioapic_irq
MMIO: mp_irqdomain_activate
MMIO: mp_irqdomain_deactivate
MMIO: native_io_apic_read
MMIO: __pci_enable_msix_range
MMIO: pci_mmcfg_read
MMIO: pci_msi_mask_irq
MMIO: pci_msi_unmask_irq
MMIO: __pci_write_msi_msg
MMIO: restore_ioapic_entries
MMIO: startup_ioapic_irq
MMIO: update_no_reboot_bit_mem

ioread*/iowrite* comes from virtio.

> > #3 some customers already declared that they will use device passthough
> > (yes, it is not safe). CSP may want to emulate random device, depending on
> > setup. Like, a power button or something.
> 
> I'm not sure I'm totally on board with that.
> 
> But, let's try to make a coherent changelog out of that mess.
> 
> 	This approach is bad for performance.  But, it has (virtually)
> 	no impact on the size of the kernel image and will work for a
> 	wide variety of drivers.  This allows TDX deployments to use
> 	arbitrary devices and device drivers, including virtio.  TDX
> 	customers have asked for the capability to use random devices in
> 	their deployments.
> 
> 	In other words, even if all of the work was done to
> 	paravirtualize all x86 MMIO users and virtio, this approach
> 	would still be needed.  There is essentially no way to get rid
> 	of this code.
> 
> 	This approach is functional for all in-kernel MMIO users current
> 	and future and does so with a minimal amount of code and kernel
> 	image bloat.
> 
> Does that summarize it?

I will integrate it in the commit message.

> But, I'd like to see one of two things:
> 1. Change the BUG()s to WARN()s.
> 2. Make it utterly clear that handle_mmio() is for handling kernel MMIO
>    only.  Definitely change the naming, possibly add a check for
>    user_mode().  In other words, make it even _less_ generic.
> 
> None of that should be hard.

Okay, I will downgrade BUG() to WARN() and return false for user_mode()
with warning.

> BTW, the BUG()s made me think about how the gp_try_fixup_and_notify()
> code would work for MMIO.  For instance, are there any places where
> fixup might be done for MMIO?  If so, an earlier BUG() wouldn't allow
> the fixup to occur.

I can be wrong, but I don't think we do fixups for MMIO.

> Do we *WANT* #VE's to be exposed to the #GP fixup machinery?

We need the fixup at least for MSRs.

-- 
 Kirill A. Shutemov