lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100906215023.GC7362@dastard>
Date:	Tue, 7 Sep 2010 07:50:23 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Mel Gorman <mel@....ul.ie>
Cc:	Wu Fengguang <fengguang.wu@...el.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel List <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Minchan Kim <minchan.kim@...il.com>,
	Christoph Lameter <cl@...ux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	David Rientjes <rientjes@...gle.com>
Subject: Re: [PATCH 3/3] mm: page allocator: Drain per-cpu lists after
 direct reclaim allocation fails

On Mon, Sep 06, 2010 at 09:40:15AM +0100, Mel Gorman wrote:
> On Mon, Sep 06, 2010 at 02:02:43PM +1000, Dave Chinner wrote:
> > however, here are the fs_mark processes:
> > 
> > [  596.628086] fs_mark       R  running task        0  2373   2163 0x00000008
> > [  596.628086]  0000000000000000 ffffffff81bb8610 00000000000008fc 0000000000000002
> > [  596.628086]  0000000000000000 0000000000000296 0000000000000297 ffffffffffffff10
> > [  596.628086]  ffffffff810b48c2 0000000000000010 0000000000000202 ffff880116b61798
> > [  596.628086] Call Trace:
> > [  596.628086]  [<ffffffff810b48c2>] ? smp_call_function_many+0x1a2/0x210
> > [  596.628086]  [<ffffffff810b48a5>] ? smp_call_function_many+0x185/0x210
> > [  596.628086]  [<ffffffff81109ff0>] ? drain_local_pages+0x0/0x20
> > [  596.628086]  [<ffffffff810b4952>] ? smp_call_function+0x22/0x30
> > [  596.628086]  [<ffffffff81084934>] ? on_each_cpu+0x24/0x50
> > [  596.628086]  [<ffffffff81108a8c>] ? drain_all_pages+0x1c/0x20
> > [  596.628086]  [<ffffffff81108fad>] ? __alloc_pages_nodemask+0x42d/0x700
> > [  596.628086]  [<ffffffff8113d0f2>] ? kmem_getpages+0x62/0x160
> > [  596.628086]  [<ffffffff8113dce6>] ? fallback_alloc+0x196/0x240
> > [  596.628086]  [<ffffffff8113da68>] ? ____cache_alloc_node+0x98/0x180
> > [  596.628086]  [<ffffffff8113e643>] ? __kmalloc+0x193/0x230
> > [  596.628086]  [<ffffffff8131083f>] ? kmem_alloc+0x8f/0xe0
> > [  596.628086]  [<ffffffff8131083f>] ? kmem_alloc+0x8f/0xe0
> > [  596.628086]  [<ffffffff8131092e>] ? kmem_zalloc+0x1e/0x50
> > [  596.628086]  [<ffffffff812fac80>] ? xfs_log_commit_cil+0x500/0x590
> > [  596.628086]  [<ffffffff81310943>] ? kmem_zalloc+0x33/0x50
> 
> This looks like an order-0 allocation. The "Drain per-cpu lists after
> direct reclaim allocation fails" avoids calling drain_all_pages() for a
> number of cases but introduces a case where it's called for order-0
> pages. The intention was to avoid allocations failing just because of
> the lists but maybe it's happening too often.

Yes, that should be an order-0 allocation. Possibly an order-1
allocation. but unlikely.

> I include a patch at the very end of this mail that might relieve this.

Ok, I'll try it later today.

> > I just went to grab the CAL counters, and found the system in
> > another livelock.  This time I managed to start the sysrq-trigger
> > dump while the livelock was in progress - I bas??cally got one shot
> > at a command before everything stopped responding. Now I'm waiting
> > for the livelock to pass.... 5min.... the fs_mark workload
> > has stopped (ctrl-c finally responded), still livelocked....
> > 10min.... 15min.... 20min.... OK, back now.
> > 
> > Interesting - all the fs_mark processes are in D state waiting on IO
> > completion processing.
> 
> Very interesting, maybe they are all stuck in congestion_wait() this
> time? There are a few sources where that is possible.

No, they are waiting on log IO completion, not doing allocation or
in the VM at all.  They stuck in xlog_get_iclog_state() waiting for
all the log IO buffers to be processed which are stuck behind the
inode buffer IO completions in th kworker threads that I posted. 

This potentially is caused by the kworker thread consolidation - log
IO completion processing used to be in a separate workqueue for
processing latency and deadlock prevention reasons - the data and
metadata IO completion can block, whereas we need the log IO
completion to occur as quickly as possible. I've seen one deadlock
that the separate work queues solved w.r.t. loop devices, and I
suspect that part of the problem here is that transaction completion
cannot occur (and free the memory it and the CIL holds) because log IO
completion processing is being delayed significantly by metadata IO
completion...

> > A second set of
> > traces I got during the livelock also showed this:
....
> > 
> > Because I tried to ctrl-c the fs_mark workload. All those lock
> > traces on the stack aren't related to XFS, so I'm wondering exactly
> > where they have come from....
> > 
> > Finally, /proc/interrupts shows:
> > 
> > CAL:      12156      12039      12676      12478      12919    12177      12767      12460   Function call interrupts
> > 
> > Which shows that this wasn't an IPI storm that caused this
> > particular livelock.
> 
> No, but it's possible we got stuck somewhere like too_many_isolated() or
> in congestion_wait. One thing at a time though, would you mind testing
> the following patch? I haven't tested this *at all* but it should reduce
> the number of times drain_all_pages() are called further while not
> eliminating them entirely.

Ok, I'll try it later today, but first I think I need to do some
deeper investigation on the kworker thread behaviour....

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ