linux-kernel - Re: [BUG] infinite loop in find_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAF1ivSbCb-eS=npyHO+iWuAXfKPjnCD+LZDWQQ9SbTxGg3nq7Q@mail.gmail.com>
Date:	Wed, 14 Sep 2011 08:34:21 +0800
From:	Lin Ming <mlin@...pku.edu.cn>
To:	Andrew Morton <akpm@...gle.com>
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Toshiyuki Okajima <toshi.okajima@...fujitsu.com>,
	Dave Chinner <david@...morbit.com>,
	Hugh Dickins <hughd@...gle.com>, Pawel Sikora <pluto@...k.net>,
	Justin Piszcz <jpiszcz@...idpixels.com>
Subject: Re: [BUG] infinite loop in find_get_pages()

On Wed, Sep 14, 2011 at 7:53 AM, Andrew Morton <akpm@...gle.com> wrote:
> On Tue, 13 Sep 2011 21:23:21 +0200
> Eric Dumazet <eric.dumazet@...il.com> wrote:
>
>> Linus,
>>
>> It seems current kernels (3.1.0-rc6) are really unreliable, or maybe I
>> expect too much from them.
>>
>> On my 4GB x86_64 machine (2 quad-core cpus, 2 threads per core), I can
>> have a cpu locked in
>>
>>  find_get_pages -> radix_tree_gang_lookup_slot -> __lookup
>>
>>
>> Problem is : A bisection will be very hard, since a lot of kernels
>> simply destroy my disk (the PCI MRRS horror stuff).
>
> Yes, that's hard.  Quite often my bisection efforts involve moving to a
> new bisection point then hand-applying a few patches to make the the
> thing compile and/or work.
>
> There have only been three commits to radix-tree.c this year, so a bit
> of manual searching through those would be practical?
>
>> Messages at console :
>>
>> INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by
>> 11 t=60002 jiffies)
>>
>> perf top -C 1
>>
>> Events: 3K cycles
>> +     43,08%  bash  [kernel.kallsyms]  [k] __lookup
>> +     41,51%  bash  [kernel.kallsyms]  [k] find_get_pages
>> +     15,31%  bash  [kernel.kallsyms]  [k] radix_tree_gang_lookup_slot
>>
>>     43.08%     bash  [kernel.kallsyms]  [k] __lookup
>>                |
>>                --- __lookup
>>                   |
>>                   |--97.09%-- radix_tree_gang_lookup_slot
>>                   |          find_get_pages
>>                   |          pagevec_lookup
>>                   |          invalidate_mapping_pages
>>                   |          drop_pagecache_sb
>>                   |          iterate_supers
>>                   |          drop_caches_sysctl_handler
>>                   |          proc_sys_call_handler.isra.3
>>                   |          proc_sys_write
>>                   |          vfs_write
>>                   |          sys_write
>>                   |          system_call_fastpath
>>                   |          __write
>>                   |
>>
>>
>> Steps to reproduce :
>>
>> In one terminal, kernel builds in a loop (defconfig + hpsa driver)
>>
>> cd /usr/src/linux
>> while :
>> do
>>  make clean
>>  make -j128
>> done
>>
>>
>> In another term :
>>
>> while :
>> do
>>  echo 3 >/proc/sys/vm/drop_caches
>>  sleep 20
>> done
>>
>
> This is a regression?  3.0 is OK?

FYI,  other guys have reported similar bugs for 3.0.

kernel 3.0: BUG: soft lockup: find_get_pages+0x51/0x110
http://marc.info/?l=linux-kernel&m=131342662028153&w=2

[3.0.2-stable] BUG: soft lockup - CPU#13 stuck for 22s! [kswapd2:1092]
http://marc.info/?l=linux-kernel&m=131469584117857&w=2

kernel 3.1-rc4: BUG soft lockup (w/frame pointers enabled)
http://marc.info/?l=linux-kernel&m=131566383719422&w=2

Lin Ming

>
> Also, do you know that the hang is happening at the radix-tree level?
> It might be at the filemap.c level or at the superblock level and we
> just end up spending most cycles at the lower levels because they're
> called so often?  The iterate_supers/drop_pagecache_sb code is fairly
> recent.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/