[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <PU1P153MB01696D28CDDC098307EBD5AFBF980@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>
Date: Tue, 22 Jan 2019 18:40:30 +0000
From: Dexuan Cui <decui@...rosoft.com>
To: kimbrownkd <kimbrownkd@...il.com>
CC: Michael Kelley <mikelley@...rosoft.com>,
Long Li <longli@...rosoft.com>,
Sasha Levin <Alexander.Levin@...rosoft.com>,
Stephen Hemminger <sthemmin@...rosoft.com>,
KY Srinivasan <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
"devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show
functions
> From: Kimberly Brown <kimbrownkd@...il.com>
> Sent: Monday, January 21, 2019 10:43 PM
> > > @@ -1421,7 +1422,10 @@ static ssize_t vmbus_chan_attr_show(struct
> > > kobject *kobj,
> > > if (chan->state != CHANNEL_OPENED_STATE)
> > > return -EINVAL;
> > >
> > > - return attribute->show(chan, buf);
> > > + mutex_lock(&vmbus_connection.channel_mutex);
> > > + ret = attribute->show(chan, buf);
> > > + mutex_unlock(&vmbus_connection.channel_mutex);
> > > + return ret;
> > > }
> >
> > It looks this patch is already able to fix the original issue:
> > 6712cc9c2211 ("vmbus: don't return values for uninitalized channels"),
> > as chan->state can't be CHANNEL_OPENED_STATE when
> > channel->ringbuffer_page is not set up yet, or has been freed.
> > -- Dexuan
>
> I think that patch 6712cc9c2211 fixes the problem when the channel is
> not set up yet, but it doesn't fix the problem when the channel is being
> closed
Yeah, now I see your point.
> The channel could be freed after the check that "chan->state" is
> CHANNEL_OPENED_STATE, while the "attribute->show()" function is running.
IMO the channel struct itself can't be freed while the "attribute->show()"
function is running, because I suppose the sysfs interface should have an extra
reference to channel->kobj (see vmbus_add_channel_kobj()), so before the sysfs
files are removed, the channel struct itself can't disappear.
(I didn't check the related code very carefully, so I could be wrong. :-) )
But as you pointed, at least for sub-channels, channel->ringbuffer_page
can indeed disappear in vmbus_close() -> ... -> vmbus_free_ring(), and
the "attribute->show()" could crash when the race happens.
Adding channel_mutex here seems to be able to fix the race for
sub-channels, as the same channel_mutex is used in vmbus_disconnect_ring().
For a primary channel, vmbus_close() -> vmbus_free_ring() can still
free channel->ringbuffer_page without notifying the "attribute->show()".
We may also need to acquire/release the channel_mutex in vmbus_close()?
> Actually, there should be checks that "chan" is not null and that
> "chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll
> need to fix that.
I suppose "chan" can not be NULL here (see the above).
Checking "chan->state" may not help to completely fix the race, because
there is no locking/synchronization code in
vmbus_close_internal() when we test and change "channel->state".
I guess we may need to check if channel->ringbuffer_page is NULL in
the "attribute->show()".
PS, to prove that a race condition does exist and can really cause a panic or
something, I usually add some msleep() delays in different paths so that I
can reproduce the crash every time I take a special action, e.g. when I read
the sysfs files of a NIC, I try to remove hv_netvsc driver. This way, I can prove
a patch can indeed help, at least it can fix the crash which would happen
without the patch. :-)
-- Dexuan
Powered by blists - more mailing lists