linux-kernel - Re: Debugging Thinkpad T430s occasional suspend failure.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFwF1qZmPaL5LPA+0ys68s=TF7wfXpb5y9GWi0q5RJDJ-Q@mail.gmail.com>
Date:	Sat, 16 Feb 2013 15:02:11 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Hugh Dickins <hughd@...gle.com>,
	Daniel Vetter <daniel.vetter@...ll.ch>,
	David Airlie <airlied@...ux.ie>
Cc:	Dave Jones <davej@...hat.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Paul McKenney <paul.mckenney@...aro.org>,
	DRI <dri-devel@...ts.freedesktop.org>
Subject: Re: Debugging Thinkpad T430s occasional suspend failure.

On Sat, Feb 16, 2013 at 1:45 PM, Hugh Dickins <hughd@...gle.com> wrote:
>
> I hacked around on your PM_TRACE set_magic_time() / read_magic_time()
> yesterday, to save an oopsing core kernel ip there, instead of hashed
> pm trace info (it makes sense in this case to invert your sequence,
> putting the high order into years and the low order into minutes).

That sounds like a good idea in general. The PM_TRACE() thing was done
to figure out things that locked up the PCI bus etc, but encoding the
oopses during suspend sounds like a really good idea too.

Is your patch clean enough to just be made part of the standard
PM_TRACE infrastructure, or was it something really hacky and one-off?

> Rewarded last night by reboot to Feb 21 14:45:53 2006.  Which is
> ffffffff812d60ed intel_choose_pipe_bpp_dither.isra.13+0x216/0x2d6
>
> /home/hugh/3087X/drivers/gpu/drm/i915/intel_display.c:4159
>          * enable dithering as needed, but that costs bandwidth.  So choose
>          * the minimum value that expresses the full color range of the fb but
>          * also stays within the max display bpc discovered above.
>          */
>
>         switch (fb->depth) {
> ffffffff812d60e9:       48 8b 55 c0             mov    -0x40(%rbp),%rdx
> ffffffff812d60ed:       8b 02                   mov    (%rdx),%eax
>
> (gcc chose to pass a pointer to fb->depth down to the function,
> instead of fb itself, since that is the only use of it there.)
>
> I expect that fb is NULL; but with an average of one failure to resume
> per day, and ~26 bits of info per crash, this is not a fast procedure!
>
> I notice that intel_pipe_set_base() allows for NULL fb,
> so I'm currently running with the oops-in-rtc hackery, plus
> -       switch (fb->depth) {
> +       if (WARN_ON(!fb))
> +               bpc = 8;
> +       else switch (fb->depth) {
>
> There's been a fair bit of change to intel_display.c since 3.7 (if
> my 3.7 was indeed good), mainly splitting intel_ into haswell_ versus
> ironlake_, but I've not yet spotted anything obvious; nor yet looked
> to see where fb would originate from anyway.
>
> Once I've got just a little more info out of it, I'll start another
> thread addressed principally to the drm/gpu/i915 guys.

I think it's worth it to give them a heads-up already. So I've cc'd
the main suspects here..

Daniel, Dave - any comments about a NULL fb in
intel_choose_pipe_bpp_dither() during either suspend or resume? Some
googling shows this:

    https://bugzilla.redhat.com/show_bug.cgi?id=895123

which sounds remarkably similar, and is also during a suspend attempt
(but apparently Satish got a full oops out).. Some timing race with a
worker entry?

                        Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/