linux-kernel - RE: [EXTERNAL] Re: "Paravisor" Feature Enumeration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <CH8PR21MB52227BAB91A6649DE8A74A2FCA85A@CH8PR21MB5222.namprd21.prod.outlook.com>
Date: Thu, 8 Jan 2026 06:53:03 +0000
From: Jon Lange <jlange@...rosoft.com>
To: "dan.j.williams@...el.com" <dan.j.williams@...el.com>, Andrew Cooper
	<andrew.cooper3@...rix.com>, Dave Hansen <dave.hansen@...el.com>
CC: Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini
	<pbonzini@...hat.com>, John Starks <John.Starks@...rosoft.com>, Will Deacon
	<will@...nel.org>, Mark Rutland <mark.rutland@....com>,
	"linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>, LKML
	<linux-kernel@...r.kernel.org>, "Edgecombe, Rick P"
	<rick.p.edgecombe@...el.com>
Subject: RE: [EXTERNAL] Re: "Paravisor" Feature Enumeration

Dan, thanks for taking the time to clarify what you meant by PV.  I couldn't tell whether you were talking about paravisor functionality or paravirtualization operations, and now I get what you mean.

You wrote:

> So, I was trying to get to the actual ops that need to be intercepted, and whether
> every operation that this paravisor wants to intercept already has an existing
> indirection or what new indirections need to be built. This probably becomes
> clearer when you have some time to build an RFC, but the array of operations to
> touch exceeds traditional paravirt hooks.
> So, for example, paravirt ops do handle MSR virtualization:

In the case of MSR - or anything else that's part of the core ISA - the paravisor handles this all transparently as part of its role as virtualization support - just like a hypervisor would do (again, that's part of the definition of paravisor mode).  I suspect the pv_ops structure you describe for MSR is designed to handle the abstractions around GHCB/GHCI for fully enlightened VMs, but in the case of a paravisor, the native RDMSR/WRMSR instructions work as expected so no paravirtualization is required.  In the paravisor scenario, this is true for every aspect of basic system execution.  Again, this is part of the core value of the paravisor: it just takes care of everything so the OS doesn't have to understand anything special about the confidential architecture.  To the extent that any pv_ops are required, they should just follow an existing virtualization path, because the paravisor is designed to mirror an established virtualization model.

> So my curiosity is whether there are other operations to capture that are
> buried deeper in the arch implementations that do not have abstractions
> today. Again, that is probably best addressed by an RFC implementation.

This is the big question, and I agree that we're not going to get very far until we start building real code.  In the example of attestation, I suspect that nothing special is required; the existing SNP and TDX platform services used by the OS should work transparently when running under a paravisor; SNP_GUEST_REQUEST over GHCB should behave as expected, and TDG.MR.REPORT will be intercepted and emulated by the L1, so no new convention should be required in either case.  The Arm CCA Planes architecture is not mature enough yet to be a firm basis for conjecture about how attestation report requests are managed, but I expect it to follow the same pattern as TDX and therefore should also work transparently.

The TDISP scenarios are much less clear, due in no small part to the fact that there is no code in the kernel yet to handle TDISP even for fully enlightened guests (as you are keenly aware).  As we design those interfaces for fully enlightened guests, it wouldn't be a bad idea to discuss how they would be handled in the paravisor case so we can minimize the need for pv_ops to handle the various configurations, but I don't want to predict how this unfolds until we actually have a real design for what TDISP negotiation will look like in at least one configuration.

-Jon

-----Original Message-----
From: dan.j.williams@...el.com <dan.j.williams@...el.com> 
Sent: Wednesday, January 7, 2026 10:42 AM
To: Jon Lange <jlange@...rosoft.com>; dan.j.williams@...el.com; Andrew Cooper <andrew.cooper3@...rix.com>; Dave Hansen <dave.hansen@...el.com>
Cc: Sean Christopherson <seanjc@...gle.com>; Paolo Bonzini <pbonzini@...hat.com>; John Starks <John.Starks@...rosoft.com>; Will Deacon <will@...nel.org>; Mark Rutland <mark.rutland@....com>; linux-coco@...ts.linux.dev; LKML <linux-kernel@...r.kernel.org>; Edgecombe, Rick P <rick.p.edgecombe@...el.com>
Subject: RE: [EXTERNAL] Re: "Paravisor" Feature Enumeration

Jon Lange wrote:
> Dan W wrote:
> 
> > It sounds like the paravisor is going to hide confidential memory 
> > management details like page-acceptance, but it is going to 
> > advertise and intercept higher order operations like generate launch 
> > attestation report and TDISP paths like lock device, get device 
> > report, accept/run device.
> 
> I think that's roughly the right mental model.  The paravisor will 
> additionally hide confidential details like MSR virtualization, I/O 
> and MMIO handling, CPUID virtualization - all of the sorts of things 
> that would generate #VE/#VC exceptions in a fully enlightened guest so 
> that the guest doesn't have to worry about those, and the paravisor 
> can provide useful functionality (like device emulation or 
> hypervisor-type functionality) through those primitives.

Ah, anything that causes #VE/#VC helps, thanks.

> > So does this paravisor need low level intercepts via pv_ops and a 
> > confidential memory-management model independent of TDX/SNP etc? Or, 
> > does it only need the higher order common "services" like 
> > attestation and TDISP.
> 
> I'm not following your question - I don't understand what you're 
> envisioning when you describe confidential memory management 
> independent of TDX/SNP.  It is the case that the paravisor is 
> responsible for the confidentiality state of all memory, and therefore 
> it will have some implementation to fulfill this responsibility.  It's 
> natural for it to do so because its own operation has to integrate 
> with the state of memory.  Following my earlier analogy that the 
> paravisor acts like a nested hypervisor for a single (confidential) 
> guest, the paravisor itself will have to implement all of the services 
> necessary to satisfy the virtualization requirements of an 
> unenlightened guest, which is far more than the "common services" that 
> you mention.  Can you give some other examples of the sort of 
> distinction you're trying to highlight?

So, I was trying to get to the actual ops that need to be intercepted, and whether every operation that this paravisor wants to intercept already has an existing indirection or what new indirections need to be built. This probably becomes clearer when you have some time to build an RFC, but the array of operations to touch exceeds traditional paravirt hooks.

So, for example, paravirt ops do handle MSR virtualization:

struct pv_cpu_ops {
...
        u64 (*read_msr)(u32 msr);
        void (*write_msr)(u32 msr, u64 val); ...
};

Other operations are outside of paravirt hooks but do have generic abstractions, like these for encrypted memory:

struct x86_guest {
        int (*enc_status_change_prepare)(unsigned long vaddr, int npages, bool enc);
        int (*enc_status_change_finish)(unsigned long vaddr, int npages, bool enc);
        bool (*enc_tlb_flush_required)(bool enc);
        bool (*enc_cache_flush_required)(void);
        void (*enc_kexec_begin)(void);
        void (*enc_kexec_finish)(void);
};

For attestation operations this effort would need to register its own tsm_report interface:

tsm_report_register(...)

...and for TDISP it would probably need to register its own TSM device:

struct_group_tagged(pci_tsm_devsec_ops, devsec_ops,
	struct pci_tsm *(*lock)(struct tsm_dev *tsm_dev,
				struct pci_dev *pdev);
	void (*unlock)(struct pci_tsm *tsm);
	int (*accept)(struct pci_dev *pdev);
);

So my curiosity is whether there are other operations to capture that are buried deeper in the arch implementations that do not have abstractions today. Again, that is probably best addressed by an RFC implementation.