[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220311171848.g5wobw3rmi4e2zkd@black.fi.intel.com>
Date: Fri, 11 Mar 2022 20:18:48 +0300
From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
luto@...nel.org, peterz@...radead.org,
sathyanarayanan.kuppuswamy@...ux.intel.com, aarcange@...hat.com,
ak@...ux.intel.com, dan.j.williams@...el.com, david@...hat.com,
hpa@...or.com, jgross@...e.com, jmattson@...gle.com,
joro@...tes.org, jpoimboe@...hat.com, knsathya@...nel.org,
pbonzini@...hat.com, sdeep@...are.com, seanjc@...gle.com,
tony.luck@...el.com, vkuznets@...hat.com, wanpengli@...cent.com,
thomas.lendacky@....com, brijesh.singh@....com, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCHv5 11/30] x86/tdx: Handle in-kernel MMIO
On Thu, Mar 10, 2022 at 09:53:01AM -0800, Dave Hansen wrote:
> On 3/10/22 08:48, Kirill A. Shutemov wrote:
> I think this is good enough:
>
> - All guest code is expected to be in TD-private memory. Being
> private to the TD, VMMs have no way to access TD-private memory and
> no way to read the instruction to decode and emulate it.
Looks good.
One remark: executing from shared memory (or walking page tables in shared
memory) triggers #PF.
>
> We don't have to rehash what private memory is or how it is implemented.
>
> >> This problem was laid out as having three cases:
> >> 1. virtio
> >> 2. x86-specific drivers
> >> 3. random drivers (everything else)
> >>
> >> #1 could be done with paravirt
> >> #2 is unspecified and unknown
> >> #3 use doesn't as far as I know exist in TDX guests today
> >
> > #2 doesn't matter from performance point of view and there is no
> > convenient place where they can be intercepted as they are scattered
> > across the tree. Patching them doesn't bring any benefit, only pain.
>
> I'd feel a lot better if this was slightly better specified. Even
> booting with a:
>
> printf("rip: %lx\n", regs->rip);
>
> in the #VE handler would give some hard data about these. This still
> feels to me like something that Sean got booting two years ago and
> nobody has really reconsidered.
Here the list I see on boot. It highly depends on QEMU setup. Any form of
device filtering will cut the further.
MMIO: ahci_enable_ahci
MMIO: ahci_freeze
MMIO: ahci_init_controller
MMIO: ahci_port_resume
MMIO: ahci_postreset
MMIO: ahci_reset_controller
MMIO: ahci_save_initial_config
MMIO: ahci_scr_read
MMIO: ahci_scr_write
MMIO: ahci_set_em_messages
MMIO: ahci_start_engine
MMIO: ahci_stop_engine
MMIO: ahci_thaw
MMIO: ioapic_set_affinity
MMIO: ioread16
MMIO: ioread32
MMIO: ioread8
MMIO: iowrite16
MMIO: iowrite32
MMIO: iowrite8
MMIO: mask_ioapic_irq
MMIO: mp_irqdomain_activate
MMIO: mp_irqdomain_deactivate
MMIO: native_io_apic_read
MMIO: __pci_enable_msix_range
MMIO: pci_mmcfg_read
MMIO: pci_msi_mask_irq
MMIO: pci_msi_unmask_irq
MMIO: __pci_write_msi_msg
MMIO: restore_ioapic_entries
MMIO: startup_ioapic_irq
MMIO: update_no_reboot_bit_mem
ioread*/iowrite* comes from virtio.
> > #3 some customers already declared that they will use device passthough
> > (yes, it is not safe). CSP may want to emulate random device, depending on
> > setup. Like, a power button or something.
>
> I'm not sure I'm totally on board with that.
>
> But, let's try to make a coherent changelog out of that mess.
>
> This approach is bad for performance. But, it has (virtually)
> no impact on the size of the kernel image and will work for a
> wide variety of drivers. This allows TDX deployments to use
> arbitrary devices and device drivers, including virtio. TDX
> customers have asked for the capability to use random devices in
> their deployments.
>
> In other words, even if all of the work was done to
> paravirtualize all x86 MMIO users and virtio, this approach
> would still be needed. There is essentially no way to get rid
> of this code.
>
> This approach is functional for all in-kernel MMIO users current
> and future and does so with a minimal amount of code and kernel
> image bloat.
>
> Does that summarize it?
I will integrate it in the commit message.
> But, I'd like to see one of two things:
> 1. Change the BUG()s to WARN()s.
> 2. Make it utterly clear that handle_mmio() is for handling kernel MMIO
> only. Definitely change the naming, possibly add a check for
> user_mode(). In other words, make it even _less_ generic.
>
> None of that should be hard.
Okay, I will downgrade BUG() to WARN() and return false for user_mode()
with warning.
> BTW, the BUG()s made me think about how the gp_try_fixup_and_notify()
> code would work for MMIO. For instance, are there any places where
> fixup might be done for MMIO? If so, an earlier BUG() wouldn't allow
> the fixup to occur.
I can be wrong, but I don't think we do fixups for MMIO.
> Do we *WANT* #VE's to be exposed to the #GP fixup machinery?
We need the fixup at least for MSRs.
--
Kirill A. Shutemov
Powered by blists - more mailing lists