[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1538044281.19334.4.camel@redhat.com>
Date: Thu, 27 Sep 2018 12:31:21 +0200
From: Mohammed Gamal <mgamal@...hat.com>
To: Stephen Hemminger <stephen@...workplumber.org>
Cc: Haiyang Zhang <haiyangz@...rosoft.com>,
Stephen Hemminger <sthemmin@...rosoft.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"otubo@...hat.com" <otubo@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
vkuznets <vkuznets@...hat.com>
Subject: Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send
On Thu, 2018-09-27 at 12:23 +0200, Stephen Hemminger wrote:
> On Thu, 27 Sep 2018 10:57:05 +0200
> Mohammed Gamal <mgamal@...hat.com> wrote:
>
> > On Wed, 2018-09-26 at 17:13 +0000, Haiyang Zhang wrote:
> > > > -----Original Message-----
> > > > From: Mohammed Gamal <mgamal@...hat.com>
> > > > Sent: Wednesday, September 26, 2018 12:34 PM
> > > > To: Stephen Hemminger <sthemmin@...rosoft.com>; netdev@...r.ker
> > > > nel.
> > > > org
> > > > Cc: KY Srinivasan <kys@...rosoft.com>; Haiyang Zhang
> > > > <haiyangz@...rosoft.com>; vkuznets <vkuznets@...hat.com>;
> > > > otubo@...hat.com; cavery <cavery@...hat.com>; linux-
> > > > kernel@...r.kernel.org; devel@...uxdriverproject.org; Mohammed
> > > > Gamal
> > > > <mgamal@...hat.com>
> > > > Subject: [PATCH] hv_netvsc: Make sure out channel is fully
> > > > opened
> > > > on send
> > > >
> > > > Dring high network traffic changes to network interface
> > > > parameters
> > > > such as
> > > > number of channels or MTU can cause a kernel panic with a NULL
> > > > pointer
> > > > dereference. This is due to netvsc_device_remove() being called
> > > > and
> > > > deallocating the channel ring buffers, which can then be
> > > > accessed
> > > > by
> > > > netvsc_send_pkt() before they're allocated on calling
> > > > netvsc_device_add()
> > > >
> > > > The patch fixes this problem by checking the channel state and
> > > > returning
> > > > ENODEV if not yet opened. We also move the call to
> > > > hv_ringbuf_avail_percent()
> > > > which may access the uninitialized ring buffer.
> > > >
> > > > Signed-off-by: Mohammed Gamal <mgamal@...hat.com>
> > > > ---
> > > > drivers/net/hyperv/netvsc.c | 7 ++++++-
> > > > 1 file changed, 6 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/net/hyperv/netvsc.c
> > > > b/drivers/net/hyperv/netvsc.c index
> > > > fe01e14..75f1b31 100644
> > > > --- a/drivers/net/hyperv/netvsc.c
> > > > +++ b/drivers/net/hyperv/netvsc.c
> > > > @@ -825,7 +825,12 @@ static inline int netvsc_send_pkt(
> > > > struct netdev_queue *txq = netdev_get_tx_queue(ndev,
> > > > packet->q_idx);
> > > > u64 req_id;
> > > > int ret;
> > > > - u32 ring_avail =
> > > > hv_get_avail_to_write_percent(&out_channel-
> > > > > outbound);
> > > >
> > > > + u32 ring_avail;
> > > > +
> > > > + if (out_channel->state != CHANNEL_OPENED_STATE)
> > > > + return -ENODEV;
> > > > +
> > > > + ring_avail =
> > > > hv_get_avail_to_write_percent(&out_channel-
> > > > > outbound);
> > >
> > > When you reproducing the NULL ptr panic, does your kernel include
> > > the
> > > following patch?
> > > hv_netvsc: common detach logic
> > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.g
> > > it/c
> > > ommit/?id=7b2ee50c0cd513a176a26a71f2989facdd75bfea
> > >
> >
> > Yes it is included. And the commit did reduce the occurrence of
> > this
> > race condition, but it still nevertheless occurs albeit rarely.
> >
> > > We call netif_tx_disable(ndev) and netif_device_detach(ndev)
> > > before
> > > doing the changes
> > > on MTU or #channels. So there should be no call to start_xmit()
> > > when
> > > channel is not ready.
> > >
> > > If you see the check for CHANNEL_OPENED_STATE is still necessary
> > > on
> > > upstream kernel (including
> > > the patch " common detach logic "), we should debug further on
> > > the
> > > code and find out the
> > > root cause.
> > >
> > > Thanks,
> > > - Haiyang
> > >
> >
> > _______________________________________________
> > devel mailing list
> > devel@...uxdriverproject.org
> > http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-
> > devel
>
> Is there some workload, that can be used to reproduce this?
> The stress test from Vitaly with changing parameters while running
> network traffic
> passes now.
>
> Can you reproduce this with the upstream current kernel?
>
> Adding the check in start xmit is still racy, and won't cure the
> problem.
>
> Another solution would be to add a grace period in the netvsc detach
> logic.
>
Steps to reproduce are listed here:
https://bugzilla.redhat.com/show_bug.cgi?id=1632653
We've also managed to reproduce the same issue upstream. It's more
likely to be reproduced on Windows 2012R2 than 2016.
Regards,
Mohammed
Powered by blists - more mailing lists