linux-kernel - Re: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <5088CC9B.6010009@gmail.com>
Date:	Wed, 24 Oct 2012 22:22:35 -0700
From:	"Justin P. Mattock" <justinmattock@...il.com>
To:	Daniel Vetter <daniel@...ll.ch>
CC:	"dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed...
 GPU hung

>
>
> On Tue, Oct 23, 2012 at 10:06:52AM -0700, Justin P. Mattock wrote:
>  > This is happening both with MAINLINE and NEXT.
>  >
>  > basically system is running fine, then under load system becomes
>  > really sluggish and unresponsive. I was able to get dmesg of the
>  > error..:
>  >
>  > [ 7745.007008] ath9k 0000:05:00.0 wlan0: disabling VHT as WMM/QoS is
>  > not supported by the AP
>  > [ 7745.007736] wlan0: associate with 68:7f:74:b8:05:82 (try 1/3)
>  > [ 7745.011456] wlan0: RX AssocResp from 68:7f:74:b8:05:82
>  > (capab=0x411 status=0 aid=5)
>  > [ 7745.011529] wlan0: associated
>  > [ 8120.812482] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
>  > elapsed... GPU hung
>  > [ 8120.812642] [drm] capturing error event; look for more
>  > information in /debug/dri/0/i915_error_state
>  > [ 8122.328682] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
>  > elapsed... GPU hung
>  > [ 8122.328845] [drm:i915_reset] *ERROR* GPU hanging too fast,
>  > declaring wedged!
>  > [ 8122.328850] [drm:i915_reset] *ERROR* Failed to reset chip.
>  >
>  > full log is here: http://fpaste.org/7xH8/
>  >
>  > as for good kernels from what I remember 3.6.0-rc1. I can try a
>  > bisect on this once I get the time. or if anybody has a patch I can
>  > test.
>
> Can you please rehand your machine, and then grab the i915_error_state
> from debugfs? That contains the gpu hang dump we need to diagnose things.
>
> And the bisect would obviously be awesome.
>
> Thanks, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

took a bit to trigger, but finally fired off.

here is a link to the file..: intel_error_decode
http://www.filefactory.com/file/22bypyjhs4mx

the file was to large to send to the list.. let me know if you need more 
info with this.
also if anybody has any ideas to trigger this would be appreciated so 
the bisect can be more precise. right now dont even think its worth it, 
due to not being able to trigger the crash causing the bisect to go 
astray and pointing to a wrong commit(which has happened in the past) 
but then again you never know.

Justin P. Mattock
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/