lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1521131053.30828.1.camel@redhat.com>
Date:   Thu, 15 Mar 2018 17:24:13 +0100
From:   Mohammed Gamal <mgamal@...hat.com>
To:     Stephen Hemminger <stephen@...workplumber.org>
Cc:     netdev@...r.kernel.org, sthemmin@...rosoft.com, otubo@...hat.com,
        linux-kernel@...r.kernel.org, devel@...uxdriverproject.org,
        vkuznets@...hat.com, davem@...emloft.net
Subject: Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send

On Wed, 2018-03-14 at 10:22 +0100, Mohammed Gamal wrote:
> On Tue, 2018-03-13 at 12:35 -0700, Stephen Hemminger wrote:
> > On Tue, 13 Mar 2018 20:06:50 +0100
> > Mohammed Gamal <mgamal@...hat.com> wrote:
> > 
> > > Dring high network traffic changes to network interface
> > > parameters
> > > such as number of channels or MTU can cause a kernel panic with a
> > > NULL
> > > pointer dereference. This is due to netvsc_device_remove() being
> > > called and deallocating the channel ring buffers, which can then
> > > be
> > > accessed by netvsc_send_pkt() before they're allocated on calling
> > > netvsc_device_add()
> > > 
> > > The patch fixes this problem by checking the channel state and
> > > returning
> > > ENODEV if not yet opened. We also move the call to
> > > hv_ringbuf_avail_percent()
> > > which may access the uninitialized ring buffer.
> > > 
> > > Signed-off-by: Mohammed Gamal <mgamal@...hat.com>
> > > ---
> > >  drivers/net/hyperv/netvsc.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/net/hyperv/netvsc.c
> > > b/drivers/net/hyperv/netvsc.c
> > > index 0265d70..44a8358 100644
> > > --- a/drivers/net/hyperv/netvsc.c
> > > +++ b/drivers/net/hyperv/netvsc.c
> > > @@ -757,7 +757,7 @@ static inline int netvsc_send_pkt(
> > >  	struct netdev_queue *txq = netdev_get_tx_queue(ndev,
> > > packet->q_idx);
> > >  	u64 req_id;
> > >  	int ret;
> > > -	u32 ring_avail = hv_ringbuf_avail_percent(&out_channel-
> > > > outbound);
> > > 
> > > +	u32 ring_avail;
> > >  
> > >  	nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
> > >  	if (skb)
> > > @@ -773,7 +773,7 @@ static inline int netvsc_send_pkt(
> > >  
> > >  	req_id = (ulong)skb;
> > >  
> > > -	if (out_channel->rescind)
> > > +	if (out_channel->rescind || out_channel->state !=
> > > CHANNEL_OPENED_STATE)
> > >  		return -ENODEV;
> > >  
> > >  	if (packet->page_buf_cnt) {
> > > @@ -791,6 +791,7 @@ static inline int netvsc_send_pkt(
> > >  				       VMBUS_DATA_PACKET_FLAG_CO
> > > MP
> > > LETION_REQUESTED);
> > >  	}
> > >  
> > > +	ring_avail = hv_ringbuf_avail_percent(&out_channel-
> > > > outbound);
> > > 
> > >  	if (ret == 0) {
> > >  		atomic_inc_return(&nvchan->queue_sends);
> > >  
> > 
> > Thanks for your patch. Yes there are races with the current update
> > logic. The root cause goes higher up in the flow; the send queues
> > should
> > be stopped before netvsc_device_remove is called. Solving it where
> > you tried
> > to is racy and not going to work reliably.
> > 
> > Network patches should go to netdev@...r.kernel.org
> > 
> > You can't move the ring_avail check until after the
> > vmbus_sendpacket
> > because
> > that will break the flow control logic.
> > 
> 
> Why? I don't see ring_avail being used before that point.

Ah, stupid me. vmbus_sendpacket() will write to the ring buffer and
that means that ring_avail value will be different than the expected.

> 
> > Instead, you should just move the avail_read check until just after
> > the existing rescind
> > check.
> > 
> > Also, you shouldn't need to check for OPENED_STATE, just rescind is
> > enough.
> 
> That rarely mitigated the race. channel->rescind flag is set on vmbus
> exit - called on module unload - and when a rescind offer is received
> from the host, which AFAICT doesn't happen on every call to
> netvsc_device_remove, so it's quite possible that the ringbuffer is
> accessed before it's allocated again on channel open and hence the
> check for OPENED_STAT - which is only set after all vmbus data is
> initialized.
> 

Perhaps I haven't been clear enough. The NULL pointer dereference
happens in the call to hv_ringbuf_avail_percent() which is used to
calculate ring_avail. 

So we need to stop the queues before calling it if the channel's ring
buffers haven't been allocated yet, but OTOH we should only stop the
queues based upon the value of ring_avail, so this leads into a chicken
and egg situation. 

Is my observation here correct? Please correct me if I am wrong,
Stephen.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ