[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <EE124450C0AAF944A40DD71E61F878C995E93E@SINEX14MBXC419.southpacific.corp.microsoft.com>
Date: Tue, 26 Aug 2014 10:30:54 +0000
From: Dexuan Cui <decui@...rosoft.com>
To: Sitsofe Wheeler <sitsofe@...il.com>
CC: KY Srinivasan <kys@...rosoft.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Haiyang Zhang <haiyangz@...rosoft.com>,
"devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PANIC, hyperv] BUG: unable to handle kernel paging request at
ffff880077800004 (hv_ringbuffer_write)
> -----Original Message-----
> From: Sitsofe Wheeler
> Sent: Tuesday, August 26, 2014 1:42 AM
> > > [ 7.645526] hv_vmbus: registering driver hyperv_fb
> > > [ 7.657553] BUG: unable to handle kernel paging request at
> > > ffff880077800004
> > > [ 7.658224] IP: [<ffffffff8159a7ac>] hv_ringbuffer_write+0x7c/0x150
> > > [ 7.658224] PGD 2da9067 PUD 2dac067 PMD 7fa27067 PTE
> > > 8000000077800060
> > > [ 7.658224] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> > It seems
> > hv_ringbuffer_write() ->
> > hv_get_ringbuffer_availbytes():
> > reading rbi->ring_buffer->read_index causes a page fault.
> >
> > It looks rbi->ring_buffer was unmapped somehow according to the
> > semantics of CONFIG_DEBUG_PAGEALLOC??? Or, was there a memory
> > corruption somewhere?
> >
> > It looks the panic will disappear if the guest isn't configured with a
> > "Network Adapter ".
IMO it has nothing to do with the hyperv netvsc, as here hypervfb is the first
one to invoke vmbus_open(), and hyperv netvsc's vmbus_open() hasn't been
invoked.
> This sounds very fishy as if network setup has left things in a bad
> state.
Ditto. I doubt the network driver causes the issue.
> What is baffles me is the whole UP vs SMP thing - why would UP
> make this show up consistently? Perhaps some assertions could be added
> to check that rbi->ring_buffer still has sane values in it after
> operations on it are finished?
With more tests, I found vcpus=2 has the same issue, despite a
small possibility.
vcpus=4 seems fine in my limited tests.
> I guess you could try switching things around and using
> kmemcheck (https://www.kernel.org/doc/Documentation/kmemcheck.txt ).
> If
> the whole area close to rbi->ring_buffer->read_index is being stomped on
> it should show up. If it's just being set to a duff value or freed that
> going to be harder to track down although poisoning before freeing
> should allow us to distinguish that case...
Thanks for the info.
Actually I found the direct cause of the panic:
sometimes vmbus_post_msg() can return 4 (HV_STATUS_INVALID_ALIGNMENT),
but vmbus_open() doesn't propagate this error to the caller
synthvid_connect_vsp(), and vmbus_open() " goto error1" and frees the
ringbuffer! So later the access to ring_buffer->read_index is caught by
CONFIG_DEBUG_PAGEALLOC.
I don't see any "invalid alignment" here... and I can't explain why vcpus=4
seems OK... Debugging WIP.
BTW, please try the attached patch.
With it, the VM doesn't panic in my side with vcpus=1 and can boot to
shell prompt(looks the boot-up is very slow. I have to wait for several minutes...)
> From your analysis this doesn't sound framebuffer related - perhaps we
> could drop the linuxfb CC's on these mails going forward?
OK. I removed linuxfb and Jean.
Thanks,
-- Dexuan
Download attachment "fix_vmbus_open.patch" of type "application/octet-stream" (1058 bytes)
Powered by blists - more mailing lists