[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dc5b1aaf-62cc-49ea-9fc7-c07b3afbd714@linux.microsoft.com>
Date: Wed, 13 Nov 2024 14:17:15 +0530
From: Naman Jain <namjain@...ux.microsoft.com>
To: Michael Kelley <mhklinux@...look.com>,
"K . Y . Srinivasan" <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
Dexuan Cui <decui@...rosoft.com>
Cc: "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
John Starks <jostarks@...rosoft.com>,
"jacob.pan@...ux.microsoft.com" <jacob.pan@...ux.microsoft.com>,
Easwar Hariharan <eahariha@...ux.microsoft.com>,
Saurabh Singh Sengar <ssengar@...ux.microsoft.com>
Subject: Re: [PATCH v2 2/2] Drivers: hv: vmbus: Log on missing offers
On 11/12/2024 8:43 AM, Michael Kelley wrote:
> From: Naman Jain <namjain@...ux.microsoft.com> Sent: Sunday, November 10, 2024 9:44 PM
>>
>> On 11/7/2024 11:14 AM, Naman Jain wrote:
>>>
>>> On 11/1/2024 12:44 AM, Michael Kelley wrote:
>>>> From: Naman Jain <namjain@...ux.microsoft.com> Sent: Tuesday, October 29, 2024 1:02 AM
>>>>>
>
> [snip]
>
>>>>> @@ -2494,6 +2495,22 @@ static int vmbus_bus_resume(struct device *dev)
>>>>>
>>>>> vmbus_request_offers();
>>>>>
>>>>> + mutex_lock(&vmbus_connection.channel_mutex);
>>>>> + list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
>>>>> + if (channel->offermsg.child_relid != INVALID_RELID)
>>>>> + continue;
>>>>> +
>>>>> + /* hvsock channels are not expected to be present. */
>>>>> + if (is_hvsock_channel(channel))
>>>>> + continue;
>>>>> +
>>>>> + pr_err("channel %pUl/%pUl not present after resume.\n",
>>>>> + &channel->offermsg.offer.if_type,
>>>>> + &channel->offermsg.offer.if_instance);
>>>>> + /* ToDo: Cleanup these channels here */
>>>>> + }
>>>>> + mutex_unlock(&vmbus_connection.channel_mutex);
>>>>> +
>>>>
>>>> Dexuan and John have explained how in Azure VMs, there should not be
>>>> any VFs assigned to the VM at the time of hibernation. So the above
>>>> check for missing offers does not trigger an error message due to
>>>> VF offers coming after the all-offers-received message.
>>>>
>>>> But what about the case of a VM running on a local Hyper-V? I'm not
>>>> completely clear, but in that case I don't think any VFs are removed
>>>> before the hibernation, especially for VM-initiated hibernation. It's
>>>
>>> I am not sure about this behavior. I have requested Dexuan offline
>>> for a comment.
>>>
>>>> a reasonable scenario to later resume that same VM, with the same
>>>> VF assigned to the VM. Because of the way current code counts
>>>> the offers, vmbus_bus_resume() waits for the VF to be offered again,
>>>> and all the channels get correct post-resume relids. But the changes
>>>> in this patch set break that scenario. Since vmbus_bus_resume() now
>>>> proceeds before the VF offer arrives, hv_pci_resume() calling
>>>> vmbus_open() could use the pre-hibernation relid for the VF and break
>>>> things. Certainly the "not present after resume" error message would
>>>> be spurious.
>>>>
>>>> Maybe the focus here is Azure, and it's tolerable for the local Hyper-V
>>>> case with a VF to not work pending later fixes. But I thought I'd call
>>>> out the potential issue (assuming my thinking is correct).
>>>>
>>>> Michael
>>>
>>> IIUC, below scenarios can happen based on your comment-
>>>
>>> Case 1:
>>> VF channel offer is received in time before hv_pci_resume() and there
>>> are no problems.
>>>
>>> Case 2:
>>> Resume proceeds just after getting ALLOFFERS_DELIVERED msg and a warning
>>> is printed that this VF channel is not present after resume.
>>> Then two scenarios can happen:
>>> Case 2.1:
>>> VF channel offer is received before hv_pci_resume() and things work
>>> fine. Warning printed is spurious.
>>> Case 2.2:
>>> VM channel offer is not received before hv_pci_resume() and relid is
>>> not yet restored by onoffer. This is a problem. Warning is printed in
>>> this case for missing offer.
>>>
>>> I think it all depends on whether or not VFs are removed in local
>>> HyperV VMs. I'll try to get this information. Thanks for pointing this
>>> out.
>>>
>>> Regards,
>>> Naman
>>>
>>
>> Hi Michael,
>> I discussed with Dexuan and we tried these scenarios. Here are the
>> observations:
>>
>> For the two ways of host initiated hibernation:
>>
>> #1: Invoke-Hibernate $vm -Device (Uses the guest shutdown component)
>> OR
>> #2: Invoke-Hibernate $vm -ComputerSystem (Uses the RequestStateChange
>> ability)
>
> Question: What Powershell module provides "Invoke-Hibernate"? It's not
> present on my Windows 11 system that is running Hyper-V, and I can't
> find any documentation about it on the web. Or maybe Invoke-Hibernate
> is a Powershell *script*?
Powertest is one of the internal packages, which has some commands to
trigger host-initiated hibernation. I should probably have mentioned
that in my original comment.
>
>>
>> #1 does not remove the VF before sending the hibernate message to the VM
>> via hv_utils, but #2 does.
>>
>> With both #1 and #2, during resume, the host offers the vPCI vmbus
>> device before the vmbus_onoffers_delivered() is called. Whether or not
>> VFs are removed doesn't matter here, because during resume the first
>> fresh kernel always requests the VF device, meaning it has become a
>> boot-time device when the 'old' kernel is resuming back. So the issue we
>> are discussing will not happen in practice and the patch won't break
>> things and won't print spurious warnings. If its OK, please let me know,
>> I'll then proceed with v3.
>>
>
> Ah, this is interesting. I'm assuming these are the details:
>
> 1) VM boots with the intent of resuming from hibernation (though
> Hyper-V doesn't know about that intent)
> 2) Original fresh kernel is loaded and begins initialization
> 3) VMBus offers come in for boot-time devices, which excludes SR-IOV VFs.
> 4) ALLOFFERS_DELIVERED message comes in
> 5) The storvsc driver initializes for the virtual disks on the VM
> 6) Kernel initialization code finds and reads the swap space to see if a
> hibernation image is present. If so, it reads in the hibernation image.
> 7) The suspend sequence is initiated (just like during hibernation)
> to shutdown the VMBus devices and terminate the VMBus connection.
> 8) Control is transferred to the previously read-in hibernation image
> 9) The hibernation image runs the resume sequence, which
> initiates a new VMBus connection and requests offers
> 10) VMBus offers come in for whatever VMBus devices were present
> when Step 7 initiated the suspend sequence. If a VF device was present
> at that time, an offer for that VF device will come in and will match up
> with the VF that was present in the VM at the time of hibernation.
> 11) ALLOFFERS_DELIVERED message comes in again for the
> newly initiated VMBus connection.
>
3), 4) works differently IMO. There is no request_for_offers, or
ALLOFFERS_DELIVERED for fresh kernel. Otherwise on adding the prints in
kernel, we should have seen these function calls *twice* in one
hibernation-resume cycle. But that is not the case.
When the older/original kernel boots up, and requests offers, it gets
those VF offers again as part of boot time offers, and then
ALLOFFERS_DELIVERED msg comes. I'm still trying to figure out how fresh
kernel requests for VF offers or if it gets those offers automatically
from the host. I will update my findings so that it can be put up in
documentation which you mentioned.
> The netvsc driver gets initialized *after* step 4, but we don't know
> exactly *when* relative to the storvsc driver. The netvsc driver must
> tell Hyper-V that it can handle an SR-IOV VF, and the VF offer is sent
> sometime after that. While this netvsc/VF sequence is happening, the
> storvsc driver is reading the hibernation image from swap (Step 6).
>
Maybe this is how fresh kernel gets the offers for VF devices.
> I think the sequence you describe works when reading the
> hibernation image from swap takes 10's of seconds, or even several
> minutes in an Azure VM with a remote disk. That gives plenty
> of time for the VF to get initialized and be fully present when Step 7
> starts. But there's no *guarantee* that the VF is initialized by then.
> It's also not clear to me what action by the guest causes Hyper-V to
> treat the VF as "added to the VM" so that in Step 10 the VF offer is
> sent before ALLOFFERS_DELIVERED.
>
> The sequence you describe also happens in an Azure VM, even if
> the VF is removed before hibernation. When the VF offer arrives
> during Step 10, it doesn't match with any VFs that were in the VM
> at the time of hibernation. It's treated as a new device, just like it
> would be if the offer arrived after ALLOFFERS_DELIVERED.
>
> But it seems like there's still the risk of having a fast swap disk
> and a small hibernation image that can be read in a shorter amount
> of time than it takes to initialize the VF to the point that Hyper-V
> treats it as added to the VM. Without knowing what that point is,
> it's hard to assess the likelihood of that happening. Or maybe there's
> an interlock I'm not aware of that ensures Step 7 can't proceed
> while the netvsc/VF sequence is in progress.
>
> So maybe it's best to proceed with this patch, and deal with the
> risk later when/if it becomes reality. I'm OK if you want to do
> that. This has been an interesting discussion that I'll try to capture
> in some high-level documentation about how Linux guests on
> Hyper-V do hibernation!
>
> Michael
I have sent v3 with the changes we discussed.
Regards,
Naman
Powered by blists - more mailing lists