lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 29 Jun 2014 13:24:04 +0300
From:	Gleb Natapov <gleb@...nel.org>
To:	Jan Kiszka <jan.kiszka@....de>
Cc:	Borislav Petkov <bp@...en8.de>,
	Paolo Bonzini <pbonzini@...hat.com>,
	lkml <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>, x86-ml <x86@...nel.org>,
	kvm@...r.kernel.org, Jörg Rödel <joro@...tes.org>
Subject: Re: __schedule #DF splat

On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote:
> On 2014-06-29 08:46, Gleb Natapov wrote:
> > On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote:
> >>  qemu-system-x86-20240 [006] ...1  9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
> >>  qemu-system-x86-20240 [006] ...1  9406.484136: kvm_inj_exception: #PF (0x2)a
> >>
> >> kvm injects the #PF into the guest.
> >>
> >>  qemu-system-x86-20240 [006] d..2  9406.484136: kvm_entry: vcpu 1
> >>  qemu-system-x86-20240 [006] d..2  9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
> >>  qemu-system-x86-20240 [006] ...1  9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
> >>  qemu-system-x86-20240 [006] ...1  9406.484141: kvm_inj_exception: #DF (0x0)
> >>
> >> Second #PF at the same address and kvm injects the #DF.
> >>
> >> BUT(!), why?
> >>
> >> I probably am missing something but WTH are we pagefaulting at a
> >> user address in context_switch() while doing a lockdep call, i.e.
> >> spin_release? We're not touching any userspace gunk there AFAICT.
> >>
> >> Is this an async pagefault or so which kvm is doing so that the guest
> >> rip is actually pointing at the wrong place?
> >>
> > There is nothing in the trace that point to async pagefault as far as I see.
> > 
> >> Or something else I'm missing, most probably...
> >>
> > Strange indeed. Can you also enable kvmmmu tracing? You can also instrument
> > kvm_multiple_exception() to see which two exception are combined into #DF.
> > 
> 
> FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It
> disappears with older KVM (didn't bisect yet, some 3.11 is fine) and
> when patch-disabling the vmport in QEMU.
> 
> Let me know if I can help with the analysis.
>
Bisection would be great of course. Once thing that is special about
vmport that comes to mind is that it reads vcpu registers to userspace and
write them back. IIRC "info registers" does the same. Can you see if the
problem is reproducible with disabled vmport, but doing "info registers"
in qemu console? Although trace does not should any exists to userspace
near the failure...

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ