lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MW2PR2101MB1052BE3C25E87FE1CA5BA0F8D7290@MW2PR2101MB1052.namprd21.prod.outlook.com>
Date:   Tue, 8 Sep 2020 21:05:34 +0000
From:   Michael Kelley <mikelley@...rosoft.com>
To:     Dexuan Cui <decui@...rosoft.com>,
        "wei.liu@...nel.org" <wei.liu@...nel.org>,
        KY Srinivasan <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        vkuznets <vkuznets@...hat.com>
Subject: RE: [PATCH] Drivers: hv: vmbus: hibernation: do not hang forever in
 vmbus_bus_resume()

From: Dexuan Cui <decui@...rosoft.com> Sent: Friday, September 4, 2020 7:56 PM
> 
> After we Stop and later Start a VM that uses Accelerated Networking (NIC
> SR-IOV), currently the VF vmbus device's Instance GUID can change, so after
> vmbus_bus_resume() -> vmbus_request_offers(), vmbus_onoffer() can not find
> the original vmbus channel of the VF, and hence we can't complete()
> vmbus_connection.ready_for_resume_event in check_ready_for_resume_event(),
> and the VM hangs in vmbus_bus_resume() forever.
> 
> Fix the issue by adding a timeout, so the resuming can still succeed, and
> the saved state is not lost, and according to my test, the user can disable
> Accelerated Networking and then will be able to SSH into the VM for
> further recovery. Also prevent the VM in question from suspending again.
> 
> The host will be fixed so in future the Instance GUID will stay the same
> across hibernation.
> 
> Fixes: d8bd2d442bb2 ("Drivers: hv: vmbus: Resume after fixing up old primary channels")
> Signed-off-by: Dexuan Cui <decui@...rosoft.com>
> ---
>  drivers/hv/vmbus_drv.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)

Reviewed-by: Michael Kelley <mikelley@...rosoft.com>

> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 910b6e90866c..946d0aba101f 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2382,7 +2382,10 @@ static int vmbus_bus_suspend(struct device *dev)
>  	if (atomic_read(&vmbus_connection.nr_chan_close_on_suspend) > 0)
>  		wait_for_completion(&vmbus_connection.ready_for_suspend_event);
> 
> -	WARN_ON(atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0);
> +	if (atomic_read(&vmbus_connection.nr_chan_fixup_on_resume) != 0) {
> +		pr_err("Can not suspend due to a previous failed resuming\n");
> +		return -EBUSY;
> +	}
> 
>  	mutex_lock(&vmbus_connection.channel_mutex);
> 
> @@ -2456,7 +2459,9 @@ static int vmbus_bus_resume(struct device *dev)
> 
>  	vmbus_request_offers();
> 
> -	wait_for_completion(&vmbus_connection.ready_for_resume_event);
> +	if (wait_for_completion_timeout(
> +		&vmbus_connection.ready_for_resume_event, 10 * HZ) == 0)
> +		pr_err("Some vmbus device is missing after suspending?\n");
> 
>  	/* Reset the event for the next suspend. */
>  	reinit_completion(&vmbus_connection.ready_for_suspend_event);
> --
> 2.19.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ