linux-kernel - RE: [PANIC, hyperv] BUG: unable to handle kernel paging request at ffff880077800004 (hv_ringbuffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <EE124450C0AAF944A40DD71E61F878C995E93E@SINEX14MBXC419.southpacific.corp.microsoft.com>
Date:	Tue, 26 Aug 2014 10:30:54 +0000
From:	Dexuan Cui <decui@...rosoft.com>
To:	Sitsofe Wheeler <sitsofe@...il.com>
CC:	KY Srinivasan <kys@...rosoft.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Haiyang Zhang <haiyangz@...rosoft.com>,
	"devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PANIC, hyperv] BUG: unable to handle kernel paging request at
 ffff880077800004 (hv_ringbuffer_write)

> -----Original Message-----
> From: Sitsofe Wheeler
> Sent: Tuesday, August 26, 2014 1:42 AM
> > > [    7.645526] hv_vmbus: registering driver hyperv_fb
> > > [    7.657553] BUG: unable to handle kernel paging request at
> > > ffff880077800004
> > > [    7.658224] IP: [<ffffffff8159a7ac>] hv_ringbuffer_write+0x7c/0x150
> > > [    7.658224] PGD 2da9067 PUD 2dac067 PMD 7fa27067 PTE
> > > 8000000077800060
> > > [    7.658224] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> > It seems
> > hv_ringbuffer_write() ->
> >     hv_get_ringbuffer_availbytes():
> >         reading rbi->ring_buffer->read_index causes a page fault.
> >
> > It looks rbi->ring_buffer was unmapped somehow according to the
> > semantics of CONFIG_DEBUG_PAGEALLOC??? Or, was there a memory
> > corruption somewhere?
> >
> > It looks the panic will disappear if the guest isn't configured with a
> > "Network Adapter ".
IMO it has nothing to do with the hyperv netvsc, as here hypervfb is the first
one to invoke vmbus_open(), and hyperv netvsc's vmbus_open() hasn't been
invoked.

> This sounds very fishy as if network setup has left things in a bad
> state. 
Ditto. I doubt the network driver causes the issue.

> What is baffles me is the whole UP vs SMP thing - why would UP
> make this show up consistently? Perhaps some assertions could be added
> to check that rbi->ring_buffer still has sane values in it after
> operations on it are finished?
With more tests, I found vcpus=2 has the same issue, despite  a
small possibility.
vcpus=4 seems fine in my limited tests.

> I guess you could try switching things around and using
> kmemcheck (https://www.kernel.org/doc/Documentation/kmemcheck.txt ).
> If
> the whole area close to rbi->ring_buffer->read_index is being stomped on
> it should show up. If it's just being set to a duff value or freed that
> going to be harder to track down although poisoning before freeing
> should allow us to distinguish that case...
Thanks for the info.

Actually I found the direct cause of the panic:
sometimes vmbus_post_msg() can return 4 (HV_STATUS_INVALID_ALIGNMENT),
but vmbus_open() doesn't propagate this error to the caller
synthvid_connect_vsp(), and vmbus_open() " goto error1"  and frees the
ringbuffer! So later the access to ring_buffer->read_index is caught by
CONFIG_DEBUG_PAGEALLOC.

I don't see any "invalid alignment" here... and I can't explain why vcpus=4
seems OK... Debugging WIP.

BTW, please try the attached patch.
With it, the VM doesn't panic in my side with vcpus=1 and can boot to
shell prompt(looks the boot-up is very slow. I have to wait for several minutes...)

> From your analysis this doesn't sound framebuffer related - perhaps we
> could drop the linuxfb CC's on these mails going forward?
OK. I removed linuxfb and Jean.

Thanks,
-- Dexuan

Download attachment "fix_vmbus_open.patch" of type "application/octet-stream" (1058 bytes)