[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF1ivSbCb-eS=npyHO+iWuAXfKPjnCD+LZDWQQ9SbTxGg3nq7Q@mail.gmail.com>
Date: Wed, 14 Sep 2011 08:34:21 +0800
From: Lin Ming <mlin@...pku.edu.cn>
To: Andrew Morton <akpm@...gle.com>
Cc: Eric Dumazet <eric.dumazet@...il.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Toshiyuki Okajima <toshi.okajima@...fujitsu.com>,
Dave Chinner <david@...morbit.com>,
Hugh Dickins <hughd@...gle.com>, Pawel Sikora <pluto@...k.net>,
Justin Piszcz <jpiszcz@...idpixels.com>
Subject: Re: [BUG] infinite loop in find_get_pages()
On Wed, Sep 14, 2011 at 7:53 AM, Andrew Morton <akpm@...gle.com> wrote:
> On Tue, 13 Sep 2011 21:23:21 +0200
> Eric Dumazet <eric.dumazet@...il.com> wrote:
>
>> Linus,
>>
>> It seems current kernels (3.1.0-rc6) are really unreliable, or maybe I
>> expect too much from them.
>>
>> On my 4GB x86_64 machine (2 quad-core cpus, 2 threads per core), I can
>> have a cpu locked in
>>
>> find_get_pages -> radix_tree_gang_lookup_slot -> __lookup
>>
>>
>> Problem is : A bisection will be very hard, since a lot of kernels
>> simply destroy my disk (the PCI MRRS horror stuff).
>
> Yes, that's hard. Quite often my bisection efforts involve moving to a
> new bisection point then hand-applying a few patches to make the the
> thing compile and/or work.
>
> There have only been three commits to radix-tree.c this year, so a bit
> of manual searching through those would be practical?
>
>> Messages at console :
>>
>> INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by
>> 11 t=60002 jiffies)
>>
>> perf top -C 1
>>
>> Events: 3K cycles
>> + 43,08% bash [kernel.kallsyms] [k] __lookup
>> + 41,51% bash [kernel.kallsyms] [k] find_get_pages
>> + 15,31% bash [kernel.kallsyms] [k] radix_tree_gang_lookup_slot
>>
>> 43.08% bash [kernel.kallsyms] [k] __lookup
>> |
>> --- __lookup
>> |
>> |--97.09%-- radix_tree_gang_lookup_slot
>> | find_get_pages
>> | pagevec_lookup
>> | invalidate_mapping_pages
>> | drop_pagecache_sb
>> | iterate_supers
>> | drop_caches_sysctl_handler
>> | proc_sys_call_handler.isra.3
>> | proc_sys_write
>> | vfs_write
>> | sys_write
>> | system_call_fastpath
>> | __write
>> |
>>
>>
>> Steps to reproduce :
>>
>> In one terminal, kernel builds in a loop (defconfig + hpsa driver)
>>
>> cd /usr/src/linux
>> while :
>> do
>> make clean
>> make -j128
>> done
>>
>>
>> In another term :
>>
>> while :
>> do
>> echo 3 >/proc/sys/vm/drop_caches
>> sleep 20
>> done
>>
>
> This is a regression? 3.0 is OK?
FYI, other guys have reported similar bugs for 3.0.
kernel 3.0: BUG: soft lockup: find_get_pages+0x51/0x110
http://marc.info/?l=linux-kernel&m=131342662028153&w=2
[3.0.2-stable] BUG: soft lockup - CPU#13 stuck for 22s! [kswapd2:1092]
http://marc.info/?l=linux-kernel&m=131469584117857&w=2
kernel 3.1-rc4: BUG soft lockup (w/frame pointers enabled)
http://marc.info/?l=linux-kernel&m=131566383719422&w=2
Lin Ming
>
> Also, do you know that the hang is happening at the radix-tree level?
> It might be at the filemap.c level or at the superblock level and we
> just end up spending most cycles at the lower levels because they're
> called so often? The iterate_supers/drop_pagecache_sb code is fairly
> recent.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists