linux-kernel - Re: [haswell_crtc_enable] WARNING: CPU: 3 PID: 109 at drivers/gpu/drm/drm_vblank.c:1066 drm_wait_one

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20171030231742.63bn6obonr3pfjnh@wfg-t540p.sh.intel.com>
Date:   Tue, 31 Oct 2017 00:17:42 +0100
From:   Fengguang Wu <fengguang.wu@...el.com>
To:     Rodrigo Vivi <rodrigo.vivi@...el.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Daniel Vetter <daniel.vetter@...ll.ch>,
        intel-gfx <intel-gfx@...ts.freedesktop.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Jani Nikula <jani.nikula@...el.com>
Subject: Re: [haswell_crtc_enable] WARNING: CPU: 3 PID: 109 at
 drivers/gpu/drm/drm_vblank.c:1066 drm_wait_one_vblank+0x18f/0x1a0 [drm]

Hi Rodrigo,

On Mon, Oct 30, 2017 at 01:03:51PM -0700, Rodrigo Vivi wrote:
>On Mon, Oct 30, 2017 at 07:10:11PM +0000, Linus Torvalds wrote:
>> On Mon, Oct 30, 2017 at 12:00 AM, Fengguang Wu <fengguang.wu@...el.com> wrote:
>> > CC intel-gfx.
>>
>> Thanks, these are all interesting (even if some of them seem to be
>> from random kernels).
>>
>> Fengguang, is this a new script that you started running? Because I'm
>> *hoping* it's not that rc6 suddenly seems so flaky, and it's really
>> that you now have a nice new script that started reporting these
>> things better, even though many of them may be old?
>
>yep, on our side there isn't anything on 4.14-rc6 from i915 that could justify
>this issues. I hope...

It's an old issue. It occurs in 4.10, too.

I noticed that all the warnings happen in one single machine:
lkp-hsw-d01, which has Intel(R) Core(TM) i7-4770 CPU.

>Well, on the other hand it would be easier to bisect if this is a 4.14-rc6 thing
>since we just got one patch for i915 and few patches for gvt for this -rc6.
>
>>
>> This particular one I will have to leave to the intel gfx people to comment on.
>
>why is the bisect hard on this case in particularly? random?

It's not reliable. Log shows dozens of bisects have been tried on this
warning and all of them failed.

>Is is reported anywhere where we could have access to full logs?

Nope.

>I couldn't find any related open issue on our side.
>Nothing like this on our CI apparently as well.
>
>Other related cases that I saw with vblank time out like this, something
>else on CPU/GPU had already died before that hence the vblank never recieved.
>So I'd like to see more logs to have a better idea.

OK, attached are 2 dmesgs for 4.14-rc6 and one for 4.10.

Thanks,
Fengguang

View attachment "dmesg-lkp-hsw-d01:20171028202439:x86_64-rhel-7.2:gcc-6:4.14.0-rc6:1" of type "text/plain" (98252 bytes)

View attachment "dmesg-lkp-hsw-d01:20171028224052:x86_64-rhel-7.2:gcc-6:4.14.0-rc6:1" of type "text/plain" (99414 bytes)

View attachment "dmesg-lkp-hsw-d01:20170717184716:x86_64-rhel-7.2:gcc-6:4.10.0:1" of type "text/plain" (155014 bytes)