linux-kernel - Re: __schedule #DF splat

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sun, 29 Jun 2014 13:24:04 +0300
From:	Gleb Natapov <gleb@...nel.org>
To:	Jan Kiszka <jan.kiszka@....de>
Cc:	Borislav Petkov <bp@...en8.de>,
	Paolo Bonzini <pbonzini@...hat.com>,
	lkml <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>, x86-ml <x86@...nel.org>,
	kvm@...r.kernel.org, Jörg Rödel <joro@...tes.org>
Subject: Re: __schedule #DF splat

On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote:
> On 2014-06-29 08:46, Gleb Natapov wrote:
> > On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote:
> >>  qemu-system-x86-20240 [006] ...1  9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
> >>  qemu-system-x86-20240 [006] ...1  9406.484136: kvm_inj_exception: #PF (0x2)a
> >>
> >> kvm injects the #PF into the guest.
> >>
> >>  qemu-system-x86-20240 [006] d..2  9406.484136: kvm_entry: vcpu 1
> >>  qemu-system-x86-20240 [006] d..2  9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
> >>  qemu-system-x86-20240 [006] ...1  9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
> >>  qemu-system-x86-20240 [006] ...1  9406.484141: kvm_inj_exception: #DF (0x0)
> >>
> >> Second #PF at the same address and kvm injects the #DF.
> >>
> >> BUT(!), why?
> >>
> >> I probably am missing something but WTH are we pagefaulting at a
> >> user address in context_switch() while doing a lockdep call, i.e.
> >> spin_release? We're not touching any userspace gunk there AFAICT.
> >>
> >> Is this an async pagefault or so which kvm is doing so that the guest
> >> rip is actually pointing at the wrong place?
> >>
> > There is nothing in the trace that point to async pagefault as far as I see.
> > 
> >> Or something else I'm missing, most probably...
> >>
> > Strange indeed. Can you also enable kvmmmu tracing? You can also instrument
> > kvm_multiple_exception() to see which two exception are combined into #DF.
> > 
> 
> FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It
> disappears with older KVM (didn't bisect yet, some 3.11 is fine) and
> when patch-disabling the vmport in QEMU.
> 
> Let me know if I can help with the analysis.
>
Bisection would be great of course. Once thing that is special about
vmport that comes to mind is that it reads vcpu registers to userspace and
write them back. IIRC "info registers" does the same. Can you see if the
problem is reproducible with disabled vmport, but doing "info registers"
in qemu console? Although trace does not should any exists to userspace
near the failure...

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/