lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 6 Jan 2012 10:46:58 +0000
From:	Mel Gorman <mel@....ul.ie>
To:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Russell King - ARM Linux <linux@....linux.org.uk>,
	KOSAKI Motohiro <kosaki.motohiro@...il.com>,
	Gilad Ben-Yossef <gilad@...yossef.com>,
	linux-kernel@...r.kernel.org, Chris Metcalf <cmetcalf@...era.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Frederic Weisbecker <fweisbec@...il.com>, linux-mm@...ck.org,
	Pekka Enberg <penberg@...nel.org>,
	Matt Mackall <mpm@...enic.com>,
	Sasha Levin <levinsasha928@...il.com>,
	Rik van Riel <riel@...hat.com>,
	Andi Kleen <andi@...stfloor.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Greg KH <gregkh@...e.de>, linux-fsdevel@...r.kernel.org,
	Avi Kivity <avi@...hat.com>
Subject: Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they
 exist

On Fri, Jan 06, 2012 at 11:36:11AM +0530, Srivatsa S. Bhat wrote:
> On 01/06/2012 03:51 AM, Mel Gorman wrote:
> 
> > (Adding Greg to cc to see if he recalls seeing issues with sysfs dentry
> > suffering from recursive locking recently)
> > 
> > On Thu, Jan 05, 2012 at 10:35:04AM -0800, Paul E. McKenney wrote:
> >> On Thu, Jan 05, 2012 at 04:35:29PM +0000, Russell King - ARM Linux wrote:
> >>> On Thu, Jan 05, 2012 at 04:17:39PM +0000, Mel Gorman wrote:
> >>>> Link please?
> >>>
> >>> Forwarded, as its still in my mailbox.
> >>>
> >>>> I'm including a patch below under development that is
> >>>> intended to only cope with the page allocator case under heavy memory
> >>>> pressure. Currently it does not pass testing because eventually RCU
> >>>> gets stalled with the following trace
> >>>>
> >>>> [ 1817.176001]  [<ffffffff810214d7>] arch_trigger_all_cpu_backtrace+0x87/0xa0
> >>>> [ 1817.176001]  [<ffffffff810c4779>] __rcu_pending+0x149/0x260
> >>>> [ 1817.176001]  [<ffffffff810c48ef>] rcu_check_callbacks+0x5f/0x110
> >>>> [ 1817.176001]  [<ffffffff81068d7f>] update_process_times+0x3f/0x80
> >>>> [ 1817.176001]  [<ffffffff8108c4eb>] tick_sched_timer+0x5b/0xc0
> >>>> [ 1817.176001]  [<ffffffff8107f28e>] __run_hrtimer+0xbe/0x1a0
> >>>> [ 1817.176001]  [<ffffffff8107f581>] hrtimer_interrupt+0xc1/0x1e0
> >>>> [ 1817.176001]  [<ffffffff81020ef3>] smp_apic_timer_interrupt+0x63/0xa0
> >>>> [ 1817.176001]  [<ffffffff81449073>] apic_timer_interrupt+0x13/0x20
> >>>> [ 1817.176001]  [<ffffffff8116c135>] vfsmount_lock_local_lock+0x25/0x30
> >>>> [ 1817.176001]  [<ffffffff8115c855>] path_init+0x2d5/0x370
> >>>> [ 1817.176001]  [<ffffffff8115eecd>] path_lookupat+0x2d/0x620
> >>>> [ 1817.176001]  [<ffffffff8115f4ef>] do_path_lookup+0x2f/0xd0
> >>>> [ 1817.176001]  [<ffffffff811602af>] user_path_at_empty+0x9f/0xd0
> >>>> [ 1817.176001]  [<ffffffff81154e7b>] vfs_fstatat+0x4b/0x90
> >>>> [ 1817.176001]  [<ffffffff81154f4f>] sys_newlstat+0x1f/0x50
> >>>> [ 1817.176001]  [<ffffffff81448692>] system_call_fastpath+0x16/0x1b
> >>>>
> >>>> It might be a separate bug, don't know for sure.
> >>
> > 
> > I rebased the patch on top of 3.2 and tested again with a bunch of
> > debugging options set (PROVE_RCU, PROVE_LOCKING etc). Same results. CPU
> > hotplug is a lot more reliable and less likely to hang but eventually
> > gets into trouble.
> > 
> 
> I was running some CPU hotplug stress tests recently and found it to be
> problematic too. Mel, I have some logs from those tests which appear very
> relevant to the "IPI to offline CPU" issue that has been discussed in this
> thread.
> 
> Kernel: 3.2-rc7
> Here is the log: 
> (Unfortunately I couldn't capture the log intact, due to some annoying
> serial console issues, but I hope this log is good enough to analyze.)
>   

Ok, it looks vaguely similar to what I'm seeing. I think I spotted
the sysfs problem as well and am testing a series. I'll add you to
the cc if it passes tests locally.

Thanks.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ