[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1108151450190.30763@p34.internal.lan>
Date: Mon, 15 Aug 2011 15:02:52 -0400 (EDT)
From: Justin Piszcz <jpiszcz@...idpixels.com>
To: Hugh Dickins <hughd@...gle.com>
cc: linux-kernel@...r.kernel.org
Subject: Re: kernel 3.0: BUG: soft lockup: find_get_pages+0x51/0x110
On Mon, 15 Aug 2011, Hugh Dickins wrote:
> On Mon, 15 Aug 2011, Justin Piszcz wrote:
>> Hello,
>>
>> What causes this(?) -- am I out of memory(?) or is this a kernel bug?
>
> It would be a kernel bug to lock up even if you are out of memory.
This machine has 48GB of RAM and its just a linux router and some gqview's
running..
>
> It does look like you're under memory pressure, but I don't see any OOM.
>
> Is this something you've noticed just once, or does it happen repeatedly?
This has happened once before (I've e-mailed LKML about it last weekend or
thereabouts but nobody responded)
It is here:
http://lkml.org/lkml/2011/8/12/54 (down?)
http://comments.gmane.org/gmane.linux.kernel/1178570
>
> Does it always hit somewhere in find_get_pages(), or does the loop span
> wider than that?
Per: http://comments.gmane.org/gmane.linux.kernel/1178570
Slightly different (From August 12)
75 [330509.718763] Call Trace:
76 [330509.718771] [<ffffffff81089e15>] ? pagevec_lookup+0x15/0x20
77 [330509.718776] [<ffffffff8108b905>] ? invalidate_mapping_pages+0x55/0x130
78 [330509.718784] [<ffffffff810d6835>] ? shrink_icache_memory+0x2c5/0x310
79 [330509.718788] [<ffffffff8108c254>] ? shrink_slab+0x104/0x170
80 [330509.718793] [<ffffffff8108eda2>] ? balance_pgdat+0x492/0x600
81 [330509.718798] [<ffffffff8108efbc>] ? kswapd+0xac/0x250
82 [330509.718803] [<ffffffff81050fd0>] ? abort_exclusive_wait+0xb0/0xb0
83 [330509.718807] [<ffffffff8108ef10>] ? balance_pgdat+0x600/0x600
84 [330509.718811] [<ffffffff8105082e>] ? kthread+0x7e/0x90
85 [330509.718818] [<ffffffff815b4e14>] ? kernel_thread_helper+0x4/0x10
86 [330509.718822] [<ffffffff810507b0>] ? kthread_worker_fn+0x120/0x120
87 [330509.718825] [<ffffffff815b4e10>] ? gs_change+0xb/0xb
The first time it happened was when running a lot of I/O \
(dumps and streams/backups over SSH).
>
> I'm answering out of interest in find_get_pages(): which does contain
> a number of gotos which could result in endless looping; except that
> they're all supposed to be for very transitory conditions which a
> second glance at the RCU-protected tree should correct.
I am using 'server' for the workload type, not 'low latency' -- which exposes
more bugs/problems..
>
> But if a radix_tree node got corrupted, then yes, it could loop forever.
>
> If it's repeatable, please try again with slab poisoning (and frame
> pointers) enabled?
I will enable frame pointers and wait for the next error/problem and report
back if/when it recurs, thanks!
Justin.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists