lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20110913165344.1d800582.akpm@google.com>
Date:	Tue, 13 Sep 2011 16:53:44 -0700
From:	Andrew Morton <akpm@...gle.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Toshiyuki Okajima <toshi.okajima@...fujitsu.com>,
	Dave Chinner <david@...morbit.com>,
	Hugh Dickins <hughd@...gle.com>
Subject: Re: [BUG] infinite loop in find_get_pages()

On Tue, 13 Sep 2011 21:23:21 +0200
Eric Dumazet <eric.dumazet@...il.com> wrote:

> Linus,
> 
> It seems current kernels (3.1.0-rc6) are really unreliable, or maybe I
> expect too much from them.
> 
> On my 4GB x86_64 machine (2 quad-core cpus, 2 threads per core), I can
> have a cpu locked in
> 
>  find_get_pages -> radix_tree_gang_lookup_slot -> __lookup 
> 
> 
> Problem is : A bisection will be very hard, since a lot of kernels
> simply destroy my disk (the PCI MRRS horror stuff).

Yes, that's hard.  Quite often my bisection efforts involve moving to a
new bisection point then hand-applying a few patches to make the the
thing compile and/or work.

There have only been three commits to radix-tree.c this year, so a bit
of manual searching through those would be practical?

> Messages at console :
>  
> INFO: rcu_preempt_state detected stalls on CPUs/tasks: {} (detected by
> 11 t=60002 jiffies)
> 
> perf top -C 1
> 
> Events: 3K cycles                                                                                                                                             
> +     43,08%  bash  [kernel.kallsyms]  [k] __lookup
> +     41,51%  bash  [kernel.kallsyms]  [k] find_get_pages
> +     15,31%  bash  [kernel.kallsyms]  [k] radix_tree_gang_lookup_slot
> 
>     43.08%     bash  [kernel.kallsyms]  [k] __lookup
>                |
>                --- __lookup
>                   |          
>                   |--97.09%-- radix_tree_gang_lookup_slot
>                   |          find_get_pages
>                   |          pagevec_lookup
>                   |          invalidate_mapping_pages
>                   |          drop_pagecache_sb
>                   |          iterate_supers
>                   |          drop_caches_sysctl_handler
>                   |          proc_sys_call_handler.isra.3
>                   |          proc_sys_write
>                   |          vfs_write
>                   |          sys_write
>                   |          system_call_fastpath
>                   |          __write
>                   |          
> 
> 
> Steps to reproduce :
> 
> In one terminal, kernel builds in a loop (defconfig + hpsa driver)
> 
> cd /usr/src/linux
> while :
> do
>  make clean
>  make -j128
> done
> 
> 
> In another term :
> 
> while :
> do
>  echo 3 >/proc/sys/vm/drop_caches
>  sleep 20
> done
> 

This is a regression?  3.0 is OK?

Also, do you know that the hang is happening at the radix-tree level? 
It might be at the filemap.c level or at the superblock level and we
just end up spending most cycles at the lower levels because they're
called so often?  The iterate_supers/drop_pagecache_sb code is fairly
recent.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ