[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<SN6PR02MB41576FC45B824CAD588DC04DD472A@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Wed, 18 Jun 2025 16:18:30 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Roman Kisel <romank@...ux.microsoft.com>, "alok.a.tiwari@...cle.com"
<alok.a.tiwari@...cle.com>, "arnd@...db.de" <arnd@...db.de>, "bp@...en8.de"
<bp@...en8.de>, "corbet@....net" <corbet@....net>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"decui@...rosoft.com" <decui@...rosoft.com>, "haiyangz@...rosoft.com"
<haiyangz@...rosoft.com>, "hpa@...or.com" <hpa@...or.com>,
"kys@...rosoft.com" <kys@...rosoft.com>, "mingo@...hat.com"
<mingo@...hat.com>, "tglx@...utronix.de" <tglx@...utronix.de>,
"wei.liu@...nel.org" <wei.liu@...nel.org>, "linux-arch@...r.kernel.org"
<linux-arch@...r.kernel.org>, "linux-doc@...r.kernel.org"
<linux-doc@...r.kernel.org>, "linux-hyperv@...r.kernel.org"
<linux-hyperv@...r.kernel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "x86@...nel.org" <x86@...nel.org>
CC: "apais@...rosoft.com" <apais@...rosoft.com>, "benhill@...rosoft.com"
<benhill@...rosoft.com>, "bperkins@...rosoft.com" <bperkins@...rosoft.com>,
"sunilmut@...rosoft.com" <sunilmut@...rosoft.com>
Subject: RE: [PATCH hyperv-next v3 06/15] Drivers: hv: Allocate the paravisor
SynIC pages when required
From: Roman Kisel <romank@...ux.microsoft.com> Sent: Tuesday, June 3, 2025 5:44 PM
>
> The paravisor needs the SynIC pages to communicate with the guest
> via the confidential VMBus.
>
> Refactor and extaned the exisitng code to account for that.
Suggest adding a bit more context to the commit message:
Confidential VMBus requires interacting with two SynICs -- one
provided by the host hypervisor, and one provided by the paravisor.
Each SynIC requires its own message and event pages.
Refactor and extend the existing code to add allocating and freeing
the message and event pages for the paravisor SynIC when it is
present.
>
> Signed-off-by: Roman Kisel <romank@...ux.microsoft.com>
> ---
> drivers/hv/hv.c | 184 +++++++++++++++++++-------------------
> drivers/hv/hyperv_vmbus.h | 17 ++++
> 2 files changed, 111 insertions(+), 90 deletions(-)
>
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 964b9102477d..e25c91eb6af5 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -94,10 +94,70 @@ int hv_post_message(union hv_connection_id connection_id,
> return hv_result(status);
> }
>
> +static int hv_alloc_page(unsigned int cpu, void **page, bool decrypt,
> + const char *note)
Why have a "cpu" argument to this function? It's not used anywhere.
> +{
> + int ret = 0;
> +
> + /*
> + * After the page changes its encryption status, its contents might
> + * appear scrambled on some hardware. Thus `get_zeroed_page` would
> + * zero the page out in vain, we do that ourselves exactly once.
s/we do that/so do that explicitly exactly once/
Avoid personal pronouns like "we".
> + *
> + * By default, the page is allocated encrypted provided the system
> + * supports that.
More precise: "the page is allocated encrypted in a CoCo VM"
> + */
> + *page = (void *)__get_free_page(GFP_KERNEL);
> + if (!*page)
> + return -ENOMEM;
> +
> + if (decrypt)
> + ret = set_memory_decrypted((unsigned long)*page, 1);
> + if (ret)
> + goto failed;
> +
> + memset(*page, 0, PAGE_SIZE);
> + return 0;
> +
> +failed:
> +
Don't need a blank line here.
> + pr_err("allocation failed for %s page, error %d when allocating the page, decrypted %d\n",
The "when allocating the page" portion of the above message is somewhat redundant.
Compare with the similar message in hv_free_page().
> + note, ret, decrypt);
> + free_page((unsigned long)*page);
I don't think the page should be freed here. When set_memory_decrypted() fails, the
encryption state of the memory is unknown, so it should not be put back on the free list.
It's the same situation as in hv_free_page().
> + *page = NULL;
> + return ret;
> +}
> +
> +static int hv_free_page(void **page, bool encrypt, const char *note)
> +{
> + int ret = 0;
> +
> + if (!*page)
> + return 0;
> +
> + if (encrypt)
> + ret = set_memory_encrypted((unsigned long)*page, 1);
> +
> + /*
> + * In the case of the action failure, the page is leaked.
> + * Something is wrong, prefer to lose the page and stay afloat.
> + */
> + if (ret) {
> + pr_err("deallocation failed for %s page, error %d, encrypt %d\n",
> + note, ret, encrypt);
> + } else
> + free_page((unsigned long)*page);
> +
> + *page = NULL;
> +
> + return ret;
> +}
> +
> int hv_synic_alloc(void)
> {
> int cpu, ret = -ENOMEM;
> struct hv_per_cpu_context *hv_cpu;
> + const bool decrypt = !vmbus_is_confidential();
>
> /*
> * First, zero all per-cpu memory areas so hv_synic_free() can
> @@ -123,73 +183,37 @@ int hv_synic_alloc(void)
> vmbus_on_msg_dpc, (unsigned long)hv_cpu);
>
> if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {
> - hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
> - if (!hv_cpu->post_msg_page) {
> - pr_err("Unable to allocate post msg page\n");
> - goto err;
> - }
> -
> - ret = set_memory_decrypted((unsigned long)hv_cpu->post_msg_page, 1);
> - if (ret) {
> - pr_err("Failed to decrypt post msg page: %d\n", ret);
> - /* Just leak the page, as it's unsafe to free the page. */
> - hv_cpu->post_msg_page = NULL;
> + ret = hv_alloc_page(cpu, &hv_cpu->post_msg_page,
> + decrypt, "post msg");
> + if (ret)
> goto err;
> - }
> -
> - memset(hv_cpu->post_msg_page, 0, PAGE_SIZE);
> }
>
> /*
> - * Synic message and event pages are allocated by paravisor.
> - * Skip these pages allocation here.
> + * If these SynIC pages are not allocated, SIEF and SIM pages
> + * are configured using what the root partition or the paravisor
> + * provides upon reading the SIEFP and SIMP registers.
> */
> if (!ms_hyperv.paravisor_present && !hv_root_partition()) {
> - hv_cpu->hyp_synic_message_page =
> - (void *)get_zeroed_page(GFP_ATOMIC);
> - if (!hv_cpu->hyp_synic_message_page) {
> - pr_err("Unable to allocate SYNIC message page\n");
> + ret = hv_alloc_page(cpu, &hv_cpu->hyp_synic_message_page,
> + decrypt, "hypervisor SynIC msg");
> + if (ret)
> goto err;
> - }
> -
> - hv_cpu->hyp_synic_event_page =
> - (void *)get_zeroed_page(GFP_ATOMIC);
> - if (!hv_cpu->hyp_synic_event_page) {
> - pr_err("Unable to allocate SYNIC event page\n");
> -
> - free_page((unsigned long)hv_cpu->hyp_synic_message_page);
> - hv_cpu->hyp_synic_message_page = NULL;
> + ret = hv_alloc_page(cpu, &hv_cpu->hyp_synic_event_page,
> + decrypt, "hypervisor SynIC event");
> + if (ret)
> goto err;
> }
> - }
>
> - if (!ms_hyperv.paravisor_present &&
> - (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
> - ret = set_memory_decrypted((unsigned long)
> - hv_cpu->hyp_synic_message_page, 1);
> - if (ret) {
> - pr_err("Failed to decrypt SYNIC msg page: %d\n", ret);
> - hv_cpu->hyp_synic_message_page = NULL;
> -
> - /*
> - * Free the event page here so that hv_synic_free()
> - * won't later try to re-encrypt it.
> - */
> - free_page((unsigned long)hv_cpu->hyp_synic_event_page);
> - hv_cpu->hyp_synic_event_page = NULL;
> + if (vmbus_is_confidential()) {
> + ret = hv_alloc_page(cpu, &hv_cpu->para_synic_message_page,
> + decrypt, "paravisor SynIC msg");
Shouldn't the "decrypt" parameter just always be passed as "false"? That's the
fundamental tenet of Confidential VMBus -- these pages should not be decrypted.
> + if (ret)
> goto err;
> - }
> -
> - ret = set_memory_decrypted((unsigned long)
> - hv_cpu->hyp_synic_event_page, 1);
> - if (ret) {
> - pr_err("Failed to decrypt SYNIC event page: %d\n", ret);
> - hv_cpu->hyp_synic_event_page = NULL;
> + ret = hv_alloc_page(cpu, &hv_cpu->para_synic_event_page,
> + decrypt, "paravisor SynIC event");
Same here. "decrypt" is always "false".
> + if (ret)
> goto err;
> - }
> -
> - memset(hv_cpu->hyp_synic_message_page, 0, PAGE_SIZE);
> - memset(hv_cpu->hyp_synic_event_page, 0, PAGE_SIZE);
> }
> }
Refactoring this function with the hv_alloc_page() helper function works out
very nicely! The code is simpler and the error handling is much easier to get right.
>
> @@ -205,48 +229,28 @@ int hv_synic_alloc(void)
>
> void hv_synic_free(void)
> {
> - int cpu, ret;
> + int cpu;
> + const bool encrypt = !vmbus_is_confidential();
>
> for_each_present_cpu(cpu) {
> struct hv_per_cpu_context *hv_cpu =
> per_cpu_ptr(hv_context.cpu_context, cpu);
>
> - /* It's better to leak the page if the encryption fails. */
> - if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {
> - if (hv_cpu->post_msg_page) {
> - ret = set_memory_encrypted((unsigned long)
> - hv_cpu->post_msg_page, 1);
> - if (ret) {
> - pr_err("Failed to encrypt post msg page: %d\n", ret);
> - hv_cpu->post_msg_page = NULL;
> - }
> - }
> + if (ms_hyperv.paravisor_present && hv_isolation_type_tdx())
> + hv_free_page(&hv_cpu->post_msg_page,
> + encrypt, "post msg");
> + if (!ms_hyperv.paravisor_present && !hv_root_partition()) {
> + hv_free_page(&hv_cpu->hyp_synic_event_page,
> + encrypt, "hypervisor SynIC event");
> + hv_free_page(&hv_cpu->hyp_synic_message_page,
> + encrypt, "hypervisor SynIC msg");
> }
> -
> - if (!ms_hyperv.paravisor_present &&
> - (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
> - if (hv_cpu->hyp_synic_message_page) {
> - ret = set_memory_encrypted((unsigned long)
> - hv_cpu->hyp_synic_message_page, 1);
> - if (ret) {
> - pr_err("Failed to encrypt SYNIC msg page: %d\n", ret);
> - hv_cpu->hyp_synic_message_page = NULL;
> - }
> - }
> -
> - if (hv_cpu->hyp_synic_event_page) {
> - ret = set_memory_encrypted((unsigned long)
> - hv_cpu->hyp_synic_event_page, 1);
> - if (ret) {
> - pr_err("Failed to encrypt SYNIC event page: %d\n", ret);
> - hv_cpu->hyp_synic_event_page = NULL;
> - }
> - }
> + if (vmbus_is_confidential()) {
> + hv_free_page(&hv_cpu->para_synic_event_page,
> + encrypt, "paravisor SynIC event");
> + hv_free_page(&hv_cpu->para_synic_message_page,
> + encrypt, "paravisor SynIC msg");
As with hv_synic_alloc(), for these two calls, always "false" for the "encrypt"
parameter.
> }
> -
> - free_page((unsigned long)hv_cpu->post_msg_page);
> - free_page((unsigned long)hv_cpu->hyp_synic_event_page);
> - free_page((unsigned long)hv_cpu->hyp_synic_message_page);
> }
Same here on the refactoring. Very nice!
>
> kfree(hv_context.hv_numa_map);
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index fc3cdb26ff1a..9619edcf9f88 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -120,8 +120,25 @@ enum {
> * Per cpu state for channel handling
> */
> struct hv_per_cpu_context {
> + /*
> + * SynIC pages for communicating with the host.
> + *
> + * These pages are accessible to the host partition and the hypervisor,
> + * so they can only be used for exchanging data when the host partition
> + * and the hypervisor are trusted.
This comment isn't quite accurate. The hypervisor SynIC and its message/event
pages are used in today's CoCo VMs (without Confidential VMBus) where the host
partition and hypervisor are not trusted. The guest must be prepared for malicious
behavior by the SynIC, but that doesn't prevent today's CoCo VMs from providing
the intended confidentiality.
> + */
> void *hyp_synic_message_page;
> void *hyp_synic_event_page;
> + /*
> + * SynIC pages for communicating with the paravisor.
> + *
> + * These pages can be accessed only from within the guest partition.
> + * Neither the host partition nor the hypervisor can access these pages,
> + * so they can be used for exchanging data when the host partition and
> + * the hypervisor are not trusted, such as in a confidential VM.
Same here on this comment.
> + */
> + void *para_synic_message_page;
> + void *para_synic_event_page;
>
> /*
> * The page is only used in hv_post_message() for a TDX VM (with the
> --
> 2.43.0
Powered by blists - more mailing lists