linux-kernel - Re: [BUG] 2.6.38-rc1-git1: hard lockup related to i915 / automated cgroup scheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AANLkTi=GjnyE1_RosS_L_sn=QCTDSgO7v9EL+1bpJTu7@mail.gmail.com>
Date:	Thu, 20 Jan 2011 09:58:42 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Knut Petersen <Knut_Petersen@...nline.de>
Cc:	airlied@...ux.ie, jesse.barnes@...el.com,
	linux-kernel@...r.kernel.org,
	intel-gfx <intel-gfx@...ts.freedesktop.org>,
	Mike Galbraith <efault@....de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: [BUG] 2.6.38-rc1-git1: hard lockup related to i915 / automated
 cgroup scheduling

On Thu, Jan 20, 2011 at 9:29 AM, Knut Petersen
<Knut_Petersen@...nline.de> wrote:
> Kernel 2.6.38-rc1 and -git1 will lock my AOpen i915GMm-HFS
> at the end of  KDE startup if automatic process group scheduling
> is actived in kernel config. A hard reset is necessary.
> Without automatic process group scheduling everything is ok.

Interesting. Most likely timing-related, but maybe there's some actual
memory corruption. Adding the scheduler guys just in case.

It might be interesting to see if enabling SLUB debugging makes any
difference. Interesting for two reasons:

 - it may just make the problem go away because it changes timings
radically enough (which is the bad case, since that doesn't really
help us very much)

 - maybe it's not timing-related, and instead shows some slab misuse
and corruption that explains the problem.

I dunno.

> Reproducibility of bug: 100 %
> System: AOpen i915GMm-Hfs, 2GB, Pentium M
> Distribution: openSuSE 11.3
>
> cu,
>  Knut
>
> Jan 20 17:57:07 golem kernel: [   58.087054] ------------[ cut here ]------------
> Jan 20 17:57:07 golem kernel: [   58.087117] kernel BUG at drivers/gpu/drm/i915/i915_gem.c:3254!

Grr. Hate people who do BUG_ON() calls that kill the machine and make
things harder to debug.

What happens if you replace that

  BUG_ON(obj->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT);

with a

  if (WARN_ON_ONCE(obj->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT))
    return -ENOMEM;

or similar? Does it limp along? I'm not suggesting that as a fix
(obviously), but I do think that we have way too many BUG_ON's, and
too few people thinking about "how can I make the machine possibly
limp on so that the oops is easier to see and report"

                     Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/