linux-kernel - RE: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <PU1P153MB01699B96311157EA9DF515E6BF970@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>
Date:   Tue, 29 Jan 2019 19:20:28 +0000
From:   Dexuan Cui <decui@...rosoft.com>
To:     kimbrownkd <kimbrownkd@...il.com>
CC:     Michael Kelley <mikelley@...rosoft.com>,
        Long Li <longli@...rosoft.com>,
        Sasha Levin <Alexander.Levin@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        KY Srinivasan <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        "devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show
 functions

> From: Kimberly Brown <kimbrownkd@...il.com>
> > ... 
> > But as you pointed, at least for sub-channels, channel->ringbuffer_page
> > can indeed disappear in vmbus_close() -> ... -> vmbus_free_ring(), and
> > the "attribute->show()" could crash when the race happens.
> > Adding channel_mutex here seems to be able to fix the race for
> > sub-channels, as the same channel_mutex is used in
> vmbus_disconnect_ring().
> >
> > For a primary channel, vmbus_close() -> vmbus_free_ring() can still
> > free channel->ringbuffer_page without notifying the "attribute->show()".
> > We may also need to acquire/release the channel_mutex in vmbus_close()?
> >
> > > Actually, there should be checks that "chan" is not null and that
> > > "chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll
> > > need to fix that.
> > I suppose "chan" can not be NULL here (see the above).
> >
> > Checking "chan->state" may not help to completely fix the race, because
> > there is no locking/synchronization code in
> > vmbus_close_internal() when we test and change "channel->state".
> >
> 
> The calls to vmbus_close_internal() for the subchannels and the primary
> channel are protected with channel_mutex in vmbus_disconnect_ring().
> This prevents "channel->state" from changing while "attribute->show()" is
> running.
Ah, I think you're right. 
 
> > I guess we may need to check if channel->ringbuffer_page is NULL in
> > the "attribute->show()".
> >
> 
> For the primary channel, vmbus_free_ring() is called after the
> return from vmbus_disconnect_ring(). Therefore, the primary channel's
> state is changed before "channel->ringbuffer_page" is set to NULL.
> Checking the channel state should be sufficient to prevent the ring
> buffers from being freed while "attribute->show()" is running. The
> ring buffers can't be freed until the channel's state is changed, and
> the channel state change is protected by the mutex.
I think you're right (I noticed in a previous mail you mentioned you would
improve your patch to check "chan->state" with the mutax held).

> I think checking that "channel->ringbuffer_page" is not NULL would
> also work, but, as you stated, we would need to aquire/release
> channel_mutex in vmbus_close().
Then it looks unnecessary to check "channel->ringbuffer_page".
 
> > PS, to prove that a race condition does exist and can really cause a panic or
> > something, I usually add some msleep() delays in different paths so that I
> > can reproduce the crash every time I take a special action, e.g. when I read
> > the sysfs files of a NIC, I try to remove hv_netvsc driver. This way, I can prove
> > a patch can indeed help, at least it can fix the crash which would happen
> > without the patch. :-)
> >
> 
> Thanks! I was able to free the ring buffers while "attribute->show()"
> was running, which caused a null pointer dereference bug. As expected,
> the mutex lock fixed it.
Awesome!

-- Dexuan