lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <DM5PR03MB2490DA10B212086187776661A0AD0@DM5PR03MB2490.namprd03.prod.outlook.com>
Date:   Fri, 28 Oct 2016 18:21:29 +0000
From:   KY Srinivasan <kys@...rosoft.com>
To:     Michael Gissing <mg@...lpeltz.net>,
        "Alex Ng (LIS)" <alexng@...rosoft.com>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
        "olaf@...fle.de" <olaf@...fle.de>,
        "apw@...onical.com" <apw@...onical.com>,
        "vkuznets@...hat.com" <vkuznets@...hat.com>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>
Subject: RE: [PATCH] Tools: hv: recover after hv_vss_daemon freeze times out



> -----Original Message-----
> From: Michael Gissing [mailto:mg@...lpeltz.net]
> Sent: Thursday, October 13, 2016 2:27 PM
> To: Alex Ng (LIS) <alexng@...rosoft.com>
> Cc: KY Srinivasan <kys@...rosoft.com>; linux-kernel@...r.kernel.org;
> devel@...uxdriverproject.org; olaf@...fle.de; apw@...onical.com;
> vkuznets@...hat.com; gregkh@...uxfoundation.org
> Subject: [PATCH] Tools: hv: recover after hv_vss_daemon freeze times out
> 
> 
> If a FIFREEZE operation run by the hv_vss_daemon takes longer than the
> VSS_USERSPACE_TIMEOUT set in the hv_snapshot module, instead of exiting
> after a write failure, try to recover by reopening the hv_vss device and
> performing the initial handshake again. Exiting causes all subsequent VSS
> operations sent by the Hyper-V host to fail until the daemon is restarted.
> 
> Signed-off-by: Michael Gissing <mg@...lpeltz.net>
> 
> ---
>   tools/hv/hv_vss_daemon.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/hv/hv_vss_daemon.c b/tools/hv/hv_vss_daemon.c
> index 5d51d6f..0ecbdab 100644
> --- a/tools/hv/hv_vss_daemon.c
> +++ b/tools/hv/hv_vss_daemon.c
> @@ -176,6 +176,7 @@ int main(int argc, char *argv[])
>       openlog("Hyper-V VSS", 0, LOG_USER);
>       syslog(LOG_INFO, "VSS starting; pid is:%d", getpid());
> 
> +recover:
>       vss_fd = open("/dev/vmbus/hv_vss", O_RDWR);
>       if (vss_fd < 0) {
>           syslog(LOG_ERR, "open /dev/vmbus/hv_vss failed; error: %d %s",
> @@ -196,6 +197,7 @@ int main(int argc, char *argv[])
>       }
> 
>       pfd.fd = vss_fd;
> +    in_handshake = 1;
> 
>       while (1) {
>           pfd.events = POLLIN;
> @@ -258,7 +260,14 @@ int main(int argc, char *argv[])
>           if (len != sizeof(struct hv_vss_msg)) {
>               syslog(LOG_ERR, "write failed; error: %d %s", errno,
>                      strerror(errno));
> -            exit(EXIT_FAILURE);
> +            /*
> +             * try to recover from possible timeout by THAWing
> +             * and restarting the message loop
> +            */
> +            vss_operate(VSS_OP_THAW);
> +            close(vss_fd);
> +            syslog(LOG_INFO, "trying to recover VSS connection");
> +            goto recover;
>           }
>       }

I agree with issuing a THAW command when we timeout in the kernel as this would leave
the file system in a sane state. That said, I am not sure why we need to close the fd and reinitialize
everything in the daemon. What if we just ignored the write error and go back to wait for new commands
from the host.

Regards,

K. Y  
> 
> --
> 2.7.4
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ