linux-kernel - Re: __schedule #DF splat

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140629105339.GF18167@minantech.com>
Date:	Sun, 29 Jun 2014 13:53:40 +0300
From:	Gleb Natapov <gleb@...nel.org>
To:	Jan Kiszka <jan.kiszka@....de>
Cc:	Borislav Petkov <bp@...en8.de>,
	Paolo Bonzini <pbonzini@...hat.com>,
	lkml <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>, x86-ml <x86@...nel.org>,
	kvm@...r.kernel.org, Jörg Rödel <joro@...tes.org>
Subject: Re: __schedule #DF splat

On Sun, Jun 29, 2014 at 12:31:50PM +0200, Jan Kiszka wrote:
> On 2014-06-29 12:24, Gleb Natapov wrote:
> > On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote:
> >> On 2014-06-29 08:46, Gleb Natapov wrote:
> >>> On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote:
> >>>>  qemu-system-x86-20240 [006] ...1  9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
> >>>>  qemu-system-x86-20240 [006] ...1  9406.484136: kvm_inj_exception: #PF (0x2)a
> >>>>
> >>>> kvm injects the #PF into the guest.
> >>>>
> >>>>  qemu-system-x86-20240 [006] d..2  9406.484136: kvm_entry: vcpu 1
> >>>>  qemu-system-x86-20240 [006] d..2  9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
> >>>>  qemu-system-x86-20240 [006] ...1  9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
> >>>>  qemu-system-x86-20240 [006] ...1  9406.484141: kvm_inj_exception: #DF (0x0)
> >>>>
> >>>> Second #PF at the same address and kvm injects the #DF.
> >>>>
> >>>> BUT(!), why?
> >>>>
> >>>> I probably am missing something but WTH are we pagefaulting at a
> >>>> user address in context_switch() while doing a lockdep call, i.e.
> >>>> spin_release? We're not touching any userspace gunk there AFAICT.
> >>>>
> >>>> Is this an async pagefault or so which kvm is doing so that the guest
> >>>> rip is actually pointing at the wrong place?
> >>>>
> >>> There is nothing in the trace that point to async pagefault as far as I see.
> >>>
> >>>> Or something else I'm missing, most probably...
> >>>>
> >>> Strange indeed. Can you also enable kvmmmu tracing? You can also instrument
> >>> kvm_multiple_exception() to see which two exception are combined into #DF.
> >>>
> >>
> >> FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It
> >> disappears with older KVM (didn't bisect yet, some 3.11 is fine) and
> >> when patch-disabling the vmport in QEMU.
> >>
> >> Let me know if I can help with the analysis.
> >>
> > Bisection would be great of course. Once thing that is special about
> > vmport that comes to mind is that it reads vcpu registers to userspace and
> > write them back. IIRC "info registers" does the same. Can you see if the
> > problem is reproducible with disabled vmport, but doing "info registers"
> > in qemu console? Although trace does not should any exists to userspace
> > near the failure...
> 
> Yes, info registers crashes the guest after a while as well (with
> different backtrace due to different context).
> 
Oh crap. Bisection would be most helpful. Just to be absolutely sure
that this is not QEMU problem: does exactly same QEMU version work with
older kernels?

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/