lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTi=GjnyE1_RosS_L_sn=QCTDSgO7v9EL+1bpJTu7@mail.gmail.com>
Date:	Thu, 20 Jan 2011 09:58:42 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Knut Petersen <Knut_Petersen@...nline.de>
Cc:	airlied@...ux.ie, jesse.barnes@...el.com,
	linux-kernel@...r.kernel.org,
	intel-gfx <intel-gfx@...ts.freedesktop.org>,
	Mike Galbraith <efault@....de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: [BUG] 2.6.38-rc1-git1: hard lockup related to i915 / automated
 cgroup scheduling

On Thu, Jan 20, 2011 at 9:29 AM, Knut Petersen
<Knut_Petersen@...nline.de> wrote:
> Kernel 2.6.38-rc1 and -git1 will lock my AOpen i915GMm-HFS
> at the end of  KDE startup if automatic process group scheduling
> is actived in kernel config. A hard reset is necessary.
> Without automatic process group scheduling everything is ok.

Interesting. Most likely timing-related, but maybe there's some actual
memory corruption. Adding the scheduler guys just in case.

It might be interesting to see if enabling SLUB debugging makes any
difference. Interesting for two reasons:

 - it may just make the problem go away because it changes timings
radically enough (which is the bad case, since that doesn't really
help us very much)

 - maybe it's not timing-related, and instead shows some slab misuse
and corruption that explains the problem.

I dunno.

> Reproducibility of bug: 100 %
> System: AOpen i915GMm-Hfs, 2GB, Pentium M
> Distribution: openSuSE 11.3
>
> cu,
>  Knut
>
> Jan 20 17:57:07 golem kernel: [   58.087054] ------------[ cut here ]------------
> Jan 20 17:57:07 golem kernel: [   58.087117] kernel BUG at drivers/gpu/drm/i915/i915_gem.c:3254!

Grr. Hate people who do BUG_ON() calls that kill the machine and make
things harder to debug.

What happens if you replace that

  BUG_ON(obj->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT);

with a

  if (WARN_ON_ONCE(obj->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT))
    return -ENOMEM;

or similar? Does it limp along? I'm not suggesting that as a fix
(obviously), but I do think that we have way too many BUG_ON's, and
too few people thinking about "how can I make the machine possibly
limp on so that the oops is easier to see and report"

                     Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ