linux-kernel - RE: [PATCH] PCI: hv: Do not wait forever on a device that has disappeared

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <KL1P15301MB0006B36AD458D43907DF82B0BF6D0@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM>
Date:   Tue, 29 May 2018 19:58:09 +0000
From:   Dexuan Cui <decui@...rosoft.com>
To:     "Michael Kelley (EOSG)" <Michael.H.Kelley@...rosoft.com>,
        'Lorenzo Pieralisi' <lorenzo.pieralisi@....com>,
        'Bjorn Helgaas' <bhelgaas@...gle.com>,
        "'linux-pci@...r.kernel.org'" <linux-pci@...r.kernel.org>,
        KY Srinivasan <kys@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        "'olaf@...fle.de'" <olaf@...fle.de>,
        "'apw@...onical.com'" <apw@...onical.com>,
        "'jasowang@...hat.com'" <jasowang@...hat.com>
CC:     "'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>,
        "'driverdev-devel@...uxdriverproject.org'" 
        <driverdev-devel@...uxdriverproject.org>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        "'vkuznets@...hat.com'" <vkuznets@...hat.com>,
        "'marcelo.cerri@...onical.com'" <marcelo.cerri@...onical.com>
Subject: RE: [PATCH] PCI: hv: Do not wait forever on a device that has
 disappeared

> From: Michael Kelley (EOSG)
> Sent: Monday, May 28, 2018 17:19
> 
> While this patch solves the immediate problem of getting hung waiting
> for a response from Hyper-V that will never come, there's another scenario
> to look at that I think introduces a race.  Suppose the guest VM issues a
> vmbus_sendpacket() request in one of the cases covered by this patch,
> and suppose that Hyper-V queues a response to the request, and then
> immediately follows with a rescind request.   Processing the response will
> get queued to a tasklet associated with the channel, while processing the
> rescind will get queued to a tasklet associated with the top-level vmbus
> connection.   From what I can see, the code doesn't impose any ordering
> on processing the two.  If the rescind is processed first, the new
> wait_for_response() function may wake up, notice the rescind flag, and
> return an error.  Its caller will return an error, and in doing so pop the
> completion packet off the stack.   When the response is processed later,
> it will try to signal completion via a completion packet that no longer
> exists, and memory corruption will likely occur.
> 
> Am I missing anything that would prevent this scenario from happening?
> It is admittedly low probability, and a solution seems non-trivial.  I haven't
> looked specifically, but a similar scenario is probably possible with the
> drivers for other VMbus devices.  We should work on a generic solution.
> 
> Michael

Thanks for spotting the race! 

IMO we can disable the per-channel tasklet to exclude the race:

--- a/drivers/pci/host/pci-hyperv.c
+++ b/drivers/pci/host/pci-hyperv.c
@@ -565,6 +565,7 @@ static int wait_for_response(struct hv_device *hdev,
 {
        while (true) {
                if (hdev->channel->rescind) {
+                       tasklet_disable(&hdev->channel->callback_event);
                        dev_warn_once(&hdev->device, "The device is gone.\n");
                        return -ENODEV;
                }

This way, when we exit the loop, we're sure hv_pci_onchannelcallback() can not 
run anymore. What do you think of this?


It looks the list of the other vmbus devices that can be hot-removed is:

the hv_utils devices
hv_sock devices
storvsc device
netvsc device

As I checked, the first 3 types of devices don't have this "send a request to the
host and wait for the response forever" pattern. NetVSC should be fixed as it has
the same pattern.

-- Dexuan