[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <10b6045e-e5e4-e1f6-f93a-34f1ad61fdfe@semihalf.com>
Date: Sat, 17 Jun 2023 19:43:25 +0200
From: Dmytro Maluka <dmy@...ihalf.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Elena Reshetova <elena.reshetova@...el.com>,
Carlos Bilbao <carlos.bilbao@....com>,
Jason CJ Chen <jason.cj.chen@...el.com>,
"corbet@....net" <corbet@....net>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"ardb@...nel.org" <ardb@...nel.org>,
"kraxel@...hat.com" <kraxel@...hat.com>,
"dovmurik@...ux.ibm.com" <dovmurik@...ux.ibm.com>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"Dhaval.Giani@....com" <Dhaval.Giani@....com>,
"michael.day@....com" <michael.day@....com>,
"pavankumar.paluri@....com" <pavankumar.paluri@....com>,
"David.Kaplan@....com" <David.Kaplan@....com>,
"Reshma.Lal@....com" <Reshma.Lal@....com>,
"Jeremy.Powell@....com" <Jeremy.Powell@....com>,
"sathyanarayanan.kuppuswamy@...ux.intel.com"
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
"alexander.shishkin@...ux.intel.com"
<alexander.shishkin@...ux.intel.com>,
"thomas.lendacky@....com" <thomas.lendacky@....com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"dgilbert@...hat.com" <dgilbert@...hat.com>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"dinechin@...hat.com" <dinechin@...hat.com>,
"linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>,
"berrange@...hat.com" <berrange@...hat.com>,
"mst@...hat.com" <mst@...hat.com>, "tytso@....edu" <tytso@....edu>,
"jikos@...nel.org" <jikos@...nel.org>,
"joro@...tes.org" <joro@...tes.org>,
"leon@...nel.org" <leon@...nel.org>,
"richard.weinberger@...il.com" <richard.weinberger@...il.com>,
"lukas@...ner.de" <lukas@...ner.de>,
"jejb@...ux.ibm.com" <jejb@...ux.ibm.com>,
"cdupontd@...hat.com" <cdupontd@...hat.com>,
"jasowang@...hat.com" <jasowang@...hat.com>,
"sameo@...osinc.com" <sameo@...osinc.com>,
"bp@...en8.de" <bp@...en8.de>,
"security@...nel.org" <security@...nel.org>,
Larry Dewey <larry.dewey@....com>, android-kvm@...gle.com,
Dmitry Torokhov <dtor@...gle.com>,
Allen Webb <allenwebb@...gle.com>,
Tomasz Nowicki <tn@...ihalf.com>,
Grzegorz Jaszczyk <jaz@...ihalf.com>,
Patryk Duda <pdk@...ihalf.com>
Subject: Re: [PATCH v2] docs: security: Confidential computing intro and
threat model for x86 virtualization
On 6/16/23 20:07, Sean Christopherson wrote:
> On Fri, Jun 16, 2023, Dmytro Maluka wrote:
>> On 6/16/23 15:56, Sean Christopherson wrote:
>>> On Fri, Jun 16, 2023, Dmytro Maluka wrote:
>>>> Again, pedantic mode on, I find it difficult to agree with the wording
>>>> that the guest owns "most of" the HW resources it uses. It controls the
>>>> data communication with its hardware device, but other resources (e.g.
>>>> CPU time, interrupts, timers, PCI config space, ACPI) are owned by the
>>>> host and virtualized by it for the guest.
>>>
>>> I wasn't saying that the guest owns most resources, I was saying that the *untrusted*
>>> host does *not* own most resources that are exposed to the guest. My understanding
>>> is that everything in your list is owned by the trusted hypervisor in the pKVM model.
>>
>> Heh, no. Most of these resources are owned by the untrusted host, that's
>> the point.
>
> Ah, I was overloading "owned", probably wrongly. What I'm trying to call out is
> that in pKVM, while the untrusted host can withold resources, it can't subvert
> most of those resources. Taking scheduling as an example, a pKVM vCPU may be
> migrated to a different pCPU by the untrusted host, but pKVM ensures that it is
> safe to run on the new pCPU, e.g. on Intel, pKVM (presumably) does any necessary
> VMCLEAR, IBPB, INVEPT, etc. to ensure the vCPU doesn't consume stale data.
Yep, agree.
>> Basically for two reasons: 1. we want to keep the trusted hypervisor as
>> simple as possible. 2. we don't need availability guarantees.
>>
>> The trusted hypervisor owns only: 2nd-stage MMU, IOMMU, VMCS (or its
>> counterparts on non-Intel), physical PCI config space (merely for
>> controlling a few critical registers like BARs and MSI address
>> registers), perhaps a few more things that don't come to my mind now.
>
> The "physical PCI config space" is a key difference, and is very relevant to this
> doc (see my response to Allen).
Yeah, thanks for the links and the context, BTW.
But let me clarify that we have 2 things here that should not be
confused with each other. We have 2 levels of virtualization of the PCI
config space in pKVM. The hypervisor traps the host's accesses to the
config space, but mostly it simply passes them through to hardware. Most
importantly, when the host reprograms a BAR, the hypervisor makes sure
to update the corresponding MMIO mappings in the host's and the guest's
2nd-level page tables (that is what makes protection of the protected
guest's passthrough PCI devices possible at all). But essentially it's
the host that manages the physical config space. And the host, in turn,
virtualizes it for the guest, using vfio-pci, like it is traditionally
done for passthrough PCI devices.
This latter, emulated config space is the concern. Looking at the
patches [1] and thinking if those MSI-X misconfiguration attacks are
possible in pKVM, I come to the conclusion that yes, they are.
Device attestation helps with trusting/verifying static information, but
the dynamically changing config space is something different.
So it seems that such "emulated PCI config misconfiguration attacks"
need to be included in the threat model for pKVM as well, i.e. need to
be hardened on the guest side. Unless we revisit our current design
assumptions for device assignment in pKVM on x86 and manage the physical
PCI config in the trusted hypervisor, not in the host (with all the
increasing complexity that comes with that, related to power management
and other things).
Also, thinking more about it: irrespectively of passthrough devices, I
guess that the protected pKVM guest may well want to use virtio with PCI
transport (not for things like networking, but that's not the point),
thus be prone to the same attacks.
>> The untrusted host schedules its guests on physical CPUs (i.e. the
>> host's L1 vCPUs are 1:1 mapped onto pCPUs), while the trusted hypervisor
>> has no scheduling, it only handles vmexits from the host and guests. The
>> untrusted host fully controls the physical interrupt controllers (I
>> think we realize that is not perfectly fine, but here we are), etc.
>
> Yeah, IRQs are a tough nut to crack.
And BTW, doesn't it mean that interrupts also need to be hardened in the
guest (if we don't want the complexity of interrupt controllers in the
trusted hypervisor)? At least sensitive ones like IPIs, but I guess we
should also consider interrupt-based timings attacks, which could use
any type of interrupt. (I have no idea how to harden either of the two
cases, but I'm no expert.)
[1] https://lore.kernel.org/all/20230119170633.40944-1-alexander.shishkin@linux.intel.com/
Powered by blists - more mailing lists