[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <19f503e369b04d01b79a1bde866a39f8@SIXPR30MB031.064d.mgd.msft.net>
Date: Tue, 14 Jul 2015 11:41:44 +0000
From: Dexuan Cui <decui@...rosoft.com>
To: Vitaly Kuznetsov <vkuznets@...hat.com>,
"devel@...uxdriverproject.org" <devel@...uxdriverproject.org>
CC: KY Srinivasan <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] Drivers: hv: vmbus: prevent new subchannel creation on
device shutdown
> -----Original Message-----
> From: Vitaly Kuznetsov
> Sent: Monday, July 13, 2015 20:19
> Subject: [PATCH] Drivers: hv: vmbus: prevent new subchannel creation on device
> shutdown
>
> When a new subchannel offer from host comes during device shutdown (e.g.
> when a netvsc/storvsc module is unloadedshortly after it was loaded) a
> crash can happen as vmbus_process_offer() is not anyhow serialized with
> vmbus_remove().
How about vmbus_onoffer_rescind()?
It's not serialized with vmbus_remove() either, so I think there is an issue too?
I remember when 'rmmod hv_netvsc', we get a rescind-offer message for
each subchannel.
> As an example we can try calling subchannel create
> callback when the module is already unloaded.
> The following crash was observed while keeping loading/unloading hv_netvsc
> module on 64-CPU guest:
>
> hv_netvsc vmbus_14: net device safe to remove
> BUG: unable to handle kernel paging request at 000000000000a118
> IP: [<ffffffffa01c1b21>] netvsc_sc_open+0x31/0xb0 [hv_netvsc]
> PGD 1f3946a067 PUD 1f38a5f067 PMD 0
> Oops: 0000 [#1] SMP
> ...
> Call Trace:
> [<ffffffffa0055cd7>] vmbus_onoffer+0x477/0x530 [hv_vmbus]
> [<ffffffff81092e1f>] ? move_linked_works+0x5f/0x80
> [<ffffffffa0055fd3>] vmbus_onmessage+0x33/0xa0 [hv_vmbus]
> [<ffffffffa0052f81>] vmbus_onmessage_work+0x21/0x30 [hv_vmbus]
> [<ffffffff81095cde>] process_one_work+0x18e/0x4e0
> ...
>
> The issue cannot be solved by just resetting sc_creation_callback on
> driver removal as while we search for the parent channel with channel_lock
> held we release it after the channel was found and it can disapper beneath
> our feet while we're still in vmbus_process_offer();
>
> Introduce new sc_create_lock mutex and take it in vmbus_remove() to ensure
> no new subchannels are created after we started the removal procedure.
> Check its state with mutex_trylock in vmbus_process_offer().
In my 8-CPU VM, I can very easily reproduce the panic by
1. running
while ((1)); do modprobe -r hv_netvsc; modprobe hv_netvsc; sleep 10; done.
and
2. in vmbus_onoffer_rescind(), we sleep 3s after a subchannel is added into
the primary channel's sc_list (and before the sc_creation_callback is invoked):
(I added line 275)
262 if (!fnew) {
263 /*
264 * Check to see if this is a sub-channel.
265 */
266 if (newchannel->offermsg.offer.sub_channel_index != 0) {
267 /*
268 * Process the sub-channel.
269 */
270 newchannel->primary_channel = channel;
271 spin_lock_irqsave(&channel->lock, flags);
272 list_add_tail(&newchannel->sc_list, &channel->sc_list);
273 channel->num_sc++;
274 spin_unlock_irqrestore(&channel->lock, flags);
275 ssleep(3);
276 } else
277 goto err_free_chan;
278 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists