[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<SN6PR02MB4157D7212FE3F0F50FAB0592D4552@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Thu, 31 Oct 2024 19:14:20 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Naman Jain <namjain@...ux.microsoft.com>, "K . Y . Srinivasan"
<kys@...rosoft.com>, Haiyang Zhang <haiyangz@...rosoft.com>, Wei Liu
<wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>
CC: "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, John Starks
<jostarks@...rosoft.com>, "jacob.pan@...ux.microsoft.com"
<jacob.pan@...ux.microsoft.com>, Easwar Hariharan
<eahariha@...ux.microsoft.com>, Saurabh Singh Sengar
<ssengar@...ux.microsoft.com>
Subject: RE: [PATCH v2 2/2] Drivers: hv: vmbus: Log on missing offers
From: Naman Jain <namjain@...ux.microsoft.com> Sent: Tuesday, October 29, 2024 1:02 AM
>
> When resuming from hibernation, log any channels that were present
> before hibernation but now are gone.
> In general, the essential virtual devices configured for a VM, remain
> same, before and after the hibernation and its not very common that
> some offers are missing.
The wording here is a bit jumbled. And let's use consistent terminology.
I'd suggest:
In general, the boot-time devices configured for a resuming VM should be
the same as the devices in the VM at the time of hibernation. It's uncommon
for the configuration to have been changed such that offers are missing.
Changing the configuration violates the rules for hibernation anyway.
> The cleanup of missing channels is not
> straight-forward and dependent on individual device driver
> functionality and implementation, so it can be added in future as
> separate changes.
>
> Signed-off-by: John Starks <jostarks@...rosoft.com>
> Co-developed-by: Naman Jain <namjain@...ux.microsoft.com>
> Signed-off-by: Naman Jain <namjain@...ux.microsoft.com>
> Reviewed-by: Easwar Hariharan <eahariha@...ux.microsoft.com>
> ---
> Changes since v1:
> https://lore.kernel.org/all/20241018115811.5530-1-namjain@linux.microsoft.com/
> * Added Easwar's Reviewed-By tag
> * Addressed Saurabh's comments:
> * Added a note for missing channel cleanup in comments and commit msg
> ---
> drivers/hv/vmbus_drv.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index bd3fc41dc06b..08214f28694a 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2462,6 +2462,7 @@ static int vmbus_bus_suspend(struct device *dev)
>
> static int vmbus_bus_resume(struct device *dev)
> {
> + struct vmbus_channel *channel;
> struct vmbus_channel_msginfo *msginfo;
> size_t msgsize;
> int ret;
> @@ -2494,6 +2495,22 @@ static int vmbus_bus_resume(struct device *dev)
>
> vmbus_request_offers();
>
> + mutex_lock(&vmbus_connection.channel_mutex);
> + list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
> + if (channel->offermsg.child_relid != INVALID_RELID)
> + continue;
> +
> + /* hvsock channels are not expected to be present. */
> + if (is_hvsock_channel(channel))
> + continue;
> +
> + pr_err("channel %pUl/%pUl not present after resume.\n",
> + &channel->offermsg.offer.if_type,
> + &channel->offermsg.offer.if_instance);
> + /* ToDo: Cleanup these channels here */
> + }
> + mutex_unlock(&vmbus_connection.channel_mutex);
> +
Dexuan and John have explained how in Azure VMs, there should not be
any VFs assigned to the VM at the time of hibernation. So the above
check for missing offers does not trigger an error message due to
VF offers coming after the all-offers-received message.
But what about the case of a VM running on a local Hyper-V? I'm not
completely clear, but in that case I don't think any VFs are removed
before the hibernation, especially for VM-initiated hibernation. It's
a reasonable scenario to later resume that same VM, with the same
VF assigned to the VM. Because of the way current code counts
the offers, vmbus_bus_resume() waits for the VF to be offered again,
and all the channels get correct post-resume relids. But the changes
in this patch set break that scenario. Since vmbus_bus_resume() now
proceeds before the VF offer arrives, hv_pci_resume() calling
vmbus_open() could use the pre-hibernation relid for the VF and break
things. Certainly the "not present after resume" error message would
be spurious.
Maybe the focus here is Azure, and it's tolerable for the local Hyper-V
case with a VF to not work pending later fixes. But I thought I'd call
out the potential issue (assuming my thinking is correct).
Michael
> /* Reset the event for the next suspend. */
> reinit_completion(&vmbus_connection.ready_for_suspend_event);
>
> --
> 2.34.1
Powered by blists - more mailing lists