netdev - Re: [PATCH net-next] hv_netvsc: fix a race between netvsc_send() and netvsc_init

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20161021.112705.350483683770661439.davem@davemloft.net>
Date:   Fri, 21 Oct 2016 11:27:05 -0400 (EDT)
From:   David Miller <davem@...emloft.net>
To:     vkuznets@...hat.com
Cc:     sthemmin@...rosoft.com, netdev@...r.kernel.org,
        devel@...uxdriverproject.org, linux-kernel@...r.kernel.org,
        kys@...rosoft.com, haiyangz@...rosoft.com
Subject: Re: [PATCH net-next] hv_netvsc: fix a race between netvsc_send()
 and netvsc_init_buf()

From: Vitaly Kuznetsov <vkuznets@...hat.com>
Date: Fri, 21 Oct 2016 17:17:18 +0200

> David Miller <davem@...emloft.net> writes:
> 
>> From: Vitaly Kuznetsov <vkuznets@...hat.com>
>> Date: Fri, 21 Oct 2016 13:15:53 +0200
>>
>>> David Miller <davem@...emloft.net> writes:
>>> 
>>>> From: Vitaly Kuznetsov <vkuznets@...hat.com>
>>>> Date: Thu, 20 Oct 2016 10:51:04 +0200
>>>>
>>>>> Stephen Hemminger <sthemmin@...rosoft.com> writes:
>>>>> 
>>>>>> Do we need ACCESS_ONCE() here to avoid check/use issues?
>>>>>>
>>>>> 
>>>>> I think we don't: this is the only place in the function where we read
>>>>> the variable so we'll get normal read. We're not trying to syncronize
>>>>> with netvsc_init_buf() as that would require locking, if we read stale
>>>>> NULL value after it was already updated on a different CPU we're fine,
>>>>> we'll just return -EAGAIN.
>>>>
>>>> The concern is if we race with netvsc_destroy_buf() and this pointer
>>>> becomes NULL after the test you are adding.
>>> 
>>> Thanks, this is interesting.
>>> 
>>> We may reach to netvsc_destroy_buf() by 3 pathes:
>>> 
>>> 1) cleanup path in netvsc_init_buf(). It is never called with
>>> send_section_map being not NULL so it seems we're safe.
>>> 
>>> 2) from netvsc_remove() when the device is being removed. As far as I
>>> understand we can't be on the transmit path after we call
>>> unregister_netdev() so we're safe.
>>> 
>>> 3) from netvsc_change_mtu() and netvsc_set_channels(). These pathes are
>>> specific to netvsc driver as basically we need to remove the device and
>>> add it back to change mtu/number of channels. The underligning 'struct
>>> net_device' stays but the rest is being removed and added back. On both
>>> pathes we first call netvsc_close() which does netif_tx_disable() and as
>>> far as I understand (I may be wrong) this guarantees that we won't be in
>>> netvsc_send().
>>> 
>>> So *I think* that we can't ever free send_section_map while in
>>> netvsc_send() and we can't even get to netvsc_send() after it is freed
>>> but I may have missed something.
>>
>> Ok you may be right.
>>
>> Can't the device be taken down by asynchronous events as well?  For example
>> if the peer end of the interface in the other guest disappears.
> 
> The device may be hot removed from host's side. In this case we follow
> the following call chain:
> 
> ... -> vmbus_device_unregister() -> ... -> vmbus_remove() -> netvsc_remove()
> 
>  and netvsc_remove() does netif_tx_disable(); unregister_netdev();
> before calling rndis_filter_device_remove() leading to netvsc_destroy_buf().
> 
> So it seems we can't be in netvsc_send() when the device is
> disappearing.

Ok, it all looks good then.