[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <33061e3f-d1a8-4240-ad7e-d6ddc089f61c@linux.microsoft.com>
Date: Tue, 5 Nov 2024 11:10:54 +0530
From: Naman Jain <namjain@...ux.microsoft.com>
To: Michael Kelley <mhklinux@...look.com>,
 "K . Y . Srinivasan" <kys@...rosoft.com>,
 Haiyang Zhang <haiyangz@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
 Dexuan Cui <decui@...rosoft.com>
Cc: "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 John Starks <jostarks@...rosoft.com>,
 "jacob.pan@...ux.microsoft.com" <jacob.pan@...ux.microsoft.com>,
 Easwar Hariharan <eahariha@...ux.microsoft.com>,
 Saurabh Singh Sengar <ssengar@...ux.microsoft.com>
Subject: Re: [PATCH v2 1/2] Drivers: hv: vmbus: Wait for offers during boot
On 11/1/2024 12:43 AM, Michael Kelley wrote:
> From: Naman Jain <namjain@...ux.microsoft.com> Sent: Tuesday, October 29, 2024 1:02 AM
>>
>> Channel offers are requested during VMBus initialization and resume
>> from hibernation. Add support to wait for all channel offers to be
> 
> Given the definition of ALLOFFERS_DELIVERED, maybe change to:
> 
> " wait for all boot-time channel offers"
Thanks for reviewing, acked.
> 
>> delivered and processed before returning from vmbus_request_offers.
>>
>> This is in analogy to a PCI bus not returning from probe until it has
>> scanned all devices on the bus.
>>
>> Without this, user mode can race with VMBus initialization and miss
>> channel offers. User mode has no way to work around this other than
>> sleeping for a while, since there is no way to know when VMBus has
>> finished processing offers.
> 
> "finished processing boot-time offers."
>
Acked.
>>
>> With this added functionality, remove earlier logic which keeps track
>> of count of offered channels post resume from hibernation. Once all
>> offers delivered message is received, no further offers are going to
>> be received.
> 
> "no further boot-time offers"
Acked.
> 
>> Consequently, logic to prevent suspend from happening
>> after previous resume had missing offers, is also removed.
>>
>> Co-developed-by: John Starks <jostarks@...rosoft.com>
>> Signed-off-by: John Starks <jostarks@...rosoft.com>
>> Signed-off-by: Naman Jain <namjain@...ux.microsoft.com>
>> Reviewed-by: Easwar Hariharan <eahariha@...ux.microsoft.com>
>> ---
>> Changes since v1:
>> https://lore.kernel.org/all/20241018115811.5530-1-namjain@linux.microsoft.com/
>> * Added Easwar's Reviewed-By tag
>> * Addressed Michael's comments:
>>    * Added explanation of all offers delivered message in comments
>>    * Removed infinite wait for offers logic, and changed it wait once.
>>    * Removed sub channel workqueue flush logic
>>    * Added comments on why MLX device offer is not expected as part of
>>      this essential boot offer list. I refrained from adding too many
> 
> You've used slightly different phrasing here ("essential boot offer list").
> Potential confusion can be avoided if the terminology is super consistent.
> I'm good with "boot-time offers" (or something else if you prefer). I'm not
> sure what "essential" means here, unless it refers to offers for VFs from
> SR-IOV NICs (like Mellanox) being excluded. But it should be fine to
> use something short like "boot-time offers" and then note the VF
> exception in the code comments as you've done.
By essential, I meant the ones that the VM is configured with.
I'll stick with "boot-time offers".
> 
>> details on it as it felt like it is beyond the scope of this patch
>> series and may not be relevant to this. However, please let me know if
>>      something needs to be added.
>> * Addressed Saurabh's comments:
>>    * Changed timeout value to 10000 ms instead of 10*10000
>>    * Changed commit msg as per suggestions
>>    * Added a comment for warning case of wait_for_completion timeout
>> ---
>>   drivers/hv/channel_mgmt.c | 55 ++++++++++++++++++++++++++++-----------
>>   drivers/hv/connection.c   |  4 +--
>>   drivers/hv/hyperv_vmbus.h | 14 +++-------
>>   drivers/hv/vmbus_drv.c    | 16 ------------
>>   4 files changed, 45 insertions(+), 44 deletions(-)
>>
>> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> index 3c6011a48dab..a2e9ebe5bf72 100644
>> --- a/drivers/hv/channel_mgmt.c
>> +++ b/drivers/hv/channel_mgmt.c
>> @@ -944,16 +944,6 @@ void vmbus_initiate_unload(bool crash)
>>   		vmbus_wait_for_unload();
>>   }
>>
>> -static void check_ready_for_resume_event(void)
>> -{
>> -	/*
>> -	 * If all the old primary channels have been fixed up, then it's safe
>> -	 * to resume.
>> -	 */
>> -	if (atomic_dec_and_test(&vmbus_connection.nr_chan_fixup_on_resume))
>> -		complete(&vmbus_connection.ready_for_resume_event);
>> -}
>> -
>>   static void vmbus_setup_channel_state(struct vmbus_channel *channel,
>>   				      struct vmbus_channel_offer_channel *offer)
>>   {
>> @@ -1109,8 +1099,6 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
>>
>>   		/* Add the channel back to the array of channels. */
>>   		vmbus_channel_map_relid(oldchannel);
>> -		check_ready_for_resume_event();
>> -
>>   		mutex_unlock(&vmbus_connection.channel_mutex);
>>   		return;
>>   	}
>> @@ -1296,13 +1284,22 @@
>> EXPORT_SYMBOL_GPL(vmbus_hvsock_device_unregister);
>>
>>   /*
>>    * vmbus_onoffers_delivered -
>> - * This is invoked when all offers have been delivered.
>> + * CHANNELMSG_ALLOFFERS_DELIVERED message arrives after all the essential
> 
> Again, I would drop "essential" here, as its meaning in this context isn't
> defined anywhere.
> 
Acked.
>> + * boot-time offers are delivered. Other channels can be hot added
>> + * or removed later, even immediately after the all-offers-delivered
>> + * message. A boot-time offer will be any of the virtual hardware the
>> + * VM is configured with at boot.
> 
> This is good to have a definition of "boot-time offer".  But I think some
> additional specification is appropriate.  Let me suggest the following:
> 
> The CHANNELMSG_ALLOFFERS_DELIVERED message arrives after all
> boot-time offers are delivered. A boot-time offer is for the primary channel
> for any virtual hardware configured in the VM at the time it boots.
> Boot-time offers include offers for physical devices assigned to the VM
> via Hyper-V's Discrete Device Assignment (DDA) functionality that are
> handled as virtual PCI devices in Linux (e.g., NVMe devices and GPUs).
> Boot-time offers do not include offers for VMBus sub-channels. Because
> devices can be hot-added to the VM after it is booted, additional channel
> offers that aren't boot-time offers can be received at any time after the
> all-offers-delivered message.
> 
> SR-IOV NIC Virtual Functions (VFs) assigned to a VM are not considered
> to be assigned to the VM at boot-time, and offers for VFs may occur after
> the all-offers-delivered message. VFs are optional accelerators to the
> synthetic VMBus NIC and are effectively hot-added only after the VMBus
> NIC channel is opened (once it knows the guest can support it, via the
> sriov bit in the netvsc protocol).
> 
> [Please confirm that my suggested wording is correct!]
This explains really well. I'll rephrase the code comments. Thanks.
> 
>>    */
>>   static void vmbus_onoffers_delivered(
>>   			struct vmbus_channel_message_header *hdr)
>>   {
>> +	complete(&vmbus_connection.all_offers_delivered_event);
>>   }
>>
>>   /*
>> @@ -1578,7 +1575,8 @@ void vmbus_onmessage(struct vmbus_channel_message_header *hdr)
>>   }
>>
>>   /*
>> - * vmbus_request_offers - Send a request to get all our pending offers.
>> + * vmbus_request_offers - Send a request to get all our pending offers
>> + * and wait for all offers to arrive.
>>    */
>>   int vmbus_request_offers(void)
>>   {
>> @@ -1596,6 +1594,10 @@ int vmbus_request_offers(void)
>>
>>   	msg->msgtype = CHANNELMSG_REQUESTOFFERS;
>>
>> +	/*
>> +	 * This REQUESTOFFERS message will result in the host sending an all
>> +	 * offers delivered message.
>> +	 */
>>   	ret = vmbus_post_msg(msg, sizeof(struct vmbus_channel_message_header),
>>   			     true);
>>
>> @@ -1607,6 +1609,29 @@ int vmbus_request_offers(void)
>>   		goto cleanup;
>>   	}
>>
>> +	/*
>> +	 * Wait for the host to send all offers.
>> +	 * Keeping it as a best-effort mechanism, where a warning is
>> +	 * printed if a timeout occurs, and execution is resumed.
>> +	 */
>> +	if (!wait_for_completion_timeout(
>> +		&vmbus_connection.all_offers_delivered_event, msecs_to_jiffies(10000))) {
>> +		pr_warn("timed out waiting for all offers to be delivered...\n");
>> +	}
>> +
>> +	/*
>> +	 * Flush handling of offer messages (which may initiate work on
>> +	 * other work queues).
>> +	 */
>> +	flush_workqueue(vmbus_connection.work_queue);
>> +
>> +	/*
>> +	 * Flush workqueues for processing the incoming offers. Subchannel
> 
> s/workqueues/workqueue/
Acked.
> 
>> +	 * offers and processing can happen later, so there is no need to
>> +	 * flush those workqueues here.
> 
> s/those workqueues/that workqueue/
Acked.
> 
>> +	 */
>> +	flush_workqueue(vmbus_connection.handle_primary_chan_wq);
>> +
>>   cleanup:
>>   	kfree(msginfo);
>>
>> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
>> index f001ae880e1d..8351360bba16 100644
>> --- a/drivers/hv/connection.c
>> +++ b/drivers/hv/connection.c
>> @@ -34,8 +34,8 @@ struct vmbus_connection vmbus_connection = {
>>
>>   	.ready_for_suspend_event = COMPLETION_INITIALIZER(
>>   				  vmbus_connection.ready_for_suspend_event),
>> -	.ready_for_resume_event	= COMPLETION_INITIALIZER(
>> -				  vmbus_connection.ready_for_resume_event),
>> +	.all_offers_delivered_event = COMPLETION_INITIALIZER(
>> +				  vmbus_connection.all_offers_delivered_event),
>>   };
>>   EXPORT_SYMBOL_GPL(vmbus_connection);
>>
>> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
>> index d2856023d53c..80cc65dac740 100644
>> --- a/drivers/hv/hyperv_vmbus.h
>> +++ b/drivers/hv/hyperv_vmbus.h
>> @@ -287,18 +287,10 @@ struct vmbus_connection {
>>   	struct completion ready_for_suspend_event;
>>
>>   	/*
>> -	 * The number of primary channels that should be "fixed up"
>> -	 * upon resume: these channels are re-offered upon resume, and some
>> -	 * fields of the channel offers (i.e. child_relid and connection_id)
>> -	 * can change, so the old offermsg must be fixed up, before the resume
>> -	 * callbacks of the VSC drivers start to further touch the channels.
>> +	 * Completed once the host has offered all channels. Note that
> 
> "all boot-time channels"
Acked.
> 
> 
>> +	 * some channels may still be being process on a work queue.
>>   	 */
>> -	atomic_t nr_chan_fixup_on_resume;
>> -	/*
>> -	 * vmbus_bus_resume() waits for "nr_chan_fixup_on_resume" to
>> -	 * drop to zero.
>> -	 */
>> -	struct completion ready_for_resume_event;
>> +	struct completion all_offers_delivered_event;
>>   };
>>
>>
>> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
>> index 9b15f7daf505..bd3fc41dc06b 100644
>> --- a/drivers/hv/vmbus_drv.c
>> +++ b/drivers/hv/vmbus_drv.c
>> @@ -2427,11 +2427,6 @@ static int vmbus_bus_suspend(struct device *dev)
>>   	if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)
>>   		wait_for_completion(&vmbus_connection.ready_for_suspend_event);
>>
>> -	if (atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0) {
>> -		pr_err("Can not suspend due to a previous failed resuming\n");
>> -		return -EBUSY;
>> -	}
>> -
>>   	mutex_lock(&vmbus_connection.channel_mutex);
>>
>>   	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
>> @@ -2456,17 +2451,12 @@ static int vmbus_bus_suspend(struct device *dev)
>>   			pr_err("Sub-channel not deleted!\n");
>>   			WARN_ON_ONCE(1);
>>   		}
>> -
>> -		atomic_inc(&vmbus_connection.nr_chan_fixup_on_resume);
>>   	}
>>
>>   	mutex_unlock(&vmbus_connection.channel_mutex);
>>
>>   	vmbus_initiate_unload(false);
>>
>> -	/* Reset the event for the next resume. */
>> -	reinit_completion(&vmbus_connection.ready_for_resume_event);
>> -
>>   	return 0;
>>   }
>>
>> @@ -2502,14 +2492,8 @@ static int vmbus_bus_resume(struct device *dev)
>>   	if (ret != 0)
>>   		return ret;
>>
>> -	WARN_ON(atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) == 0);
>> -
>>   	vmbus_request_offers();
>>
>> -	if (wait_for_completion_timeout(
>> -		&vmbus_connection.ready_for_resume_event, 10 * HZ) == 0)
>> -		pr_err("Some vmbus device is missing after suspending?\n");
>> -
>>   	/* Reset the event for the next suspend. */
>>   	reinit_completion(&vmbus_connection.ready_for_suspend_event);
>>
>> --
>> 2.34.1
Thanks,
Naman
Powered by blists - more mailing lists
 
