linux-kernel - Re: 2.6.24 regression: pan hanging unkilleable and un-straceable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <200802051002.28051.nickpiggin@yahoo.com.au>
Date:	Tue, 5 Feb 2008 10:02:27 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Mike Galbraith <efault@....de>
Cc:	Frederik Himpe <fhimpe@...enet.be>, linux-kernel@...r.kernel.org
Subject: Re: 2.6.24 regression: pan hanging unkilleable and un-straceable

On Tuesday 05 February 2008 01:49, Mike Galbraith wrote:
> On Tue, 2008-01-22 at 06:47 +0100, Mike Galbraith wrote:
> > On Tue, 2008-01-22 at 16:25 +1100, Nick Piggin wrote:
> > > On Tuesday 22 January 2008 16:03, Mike Galbraith wrote:
> > > > I've hit same twice recently (not pan, and not repeatable).
> > >
> > > Nasty. The attached patch is something really simple that can sometimes
> > > help. sysrq+p is also an option, if you're on a UP system.
> >
> > SMP (P4/HT imitating real cores)
> >
> > > Any luck getting traces?
> >
> > We'll see.  Armed.
>
> Hm.  ld just went loopy (but killable) in v2.6.24-6928-g9135f19.  During
> kbuild, modpost segfaulted, restart build, ld goes gaga.  Third attempt,
> build finished.  Not what I hit before, but mentionable.
>
>
> [  674.589134] modpost[18588]: segfault at 3e8dc42c ip 0804a96d sp af982920
> error 5 in modpost[8048000+9000] [  674.589211] mm/memory.c:115: bad pgd
> 3e081163.
> [  674.589214] mm/memory.c:115: bad pgd 3e0d2163.
> [  674.589217] mm/memory.c:115: bad pgd 3eb01163.

Hmm, this _could_ be bad memory. Or if it is very easy to reproduce with
a particular kernel version, then it is probably a memory scribble from
another part of the kernel :(

First thing I guess would be easy and helpful to run memtest86 for a
while if you have time.

If that's clean, then I don't have another good option except to bisect
the problem. Turning on DEBUG_VM, DEBUG_SLAB, DEBUG_LIST, DEBUG_PAGEALLOC,
DEBUG_STACKOVERFLOW, DEBUG_RODATA might help catch it sooner... SLAB and
PAGEALLOC could slow you down quite a bit though. And if the problem is
quite reproduceable, then obviously don't touch your config ;)

Thanks,
Nick


>
> [ 1407.322144]  =======================
> [ 1407.322144] ld            R running      0 21963  21962
> [ 1407.322144]        db9d7f1c 00200086 c75f9020 b1814300 b0428300 b0428300
> b0428300 c75f9280 [ 1407.322144]        b1814300 00000001 db9d7000 00000000
> d08c2f90 dba4f300 00000002 00000000 [ 1407.322144]        b1810120 dba4f334
> 00200046 ffffffff db9d7000 c75f9020 db9d7f30 b02f333f [ 1407.322144] Call
> Trace:
> [ 1407.322144]  [<b02f333f>] preempt_schedule_irq+0x45/0x5b
> [ 1407.322144]  [<b0117a10>] ? do_page_fault+0x0/0x470
> [ 1407.322144]  [<b0104d6e>] need_resched+0x1f/0x21
> [ 1407.322144]  [<b0117a10>] ? do_page_fault+0x0/0x470
> [ 1407.322144]  [<b0117a5c>] ? do_page_fault+0x4c/0x470
> [ 1407.322144]  [<b0117a10>] ? do_page_fault+0x0/0x470
> [ 1407.322144]  [<b02f4a3a>] ? error_code+0x72/0x78
> [ 1407.322144]  [<b02f0000>] ? init_transmeta+0xcf/0x22f <== zzt P4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/