[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20091201102645.5C0A.A69D9226@jp.fujitsu.com>
Date: Tue, 1 Dec 2009 21:23:23 +0900 (JST)
From: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To: Larry Woodman <lwoodman@...hat.com>
Cc: kosaki.motohiro@...fujitsu.com, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, akpm@...ux-foundation.org,
Hugh Dickins <hugh.dickins@...cali.co.uk>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Rik van Riel <riel@...hat.com>,
Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: [RFC] high system time & lock contention running large mixed workload
(cc to some related person)
> The cause was determined to be the unconditional call to
> page_referenced() for every mapped page encountered in
> shrink_active_list(). page_referenced() takes the anon_vma->lock and
> calls page_referenced_one() for each vma. page_referenced_one() then
> calls page_check_address() which takes the pte_lockptr spinlock. If
> several CPUs are doing this at the same time there is a lot of
> pte_lockptr spinlock contention with the anon_vma->lock held. This
> causes contention on the anon_vma->lock, stalling in the fo and very
> high system time.
>
> Before the splitLRU patch shrink_active_list() would only call
> page_referenced() when reclaim_mapped got set. reclaim_mapped only got
> set when the priority worked its way from 12 all the way to 7. This
> prevented page_referenced() from being called from shrink_active_list()
> until the system was really struggling to reclaim memory.
>
> On way to prevent this is to change page_check_address() to execute a
> spin_trylock(ptl) when it was called by shrink_active_list() and simply
> fail if it could not get the pte_lockptr spinlock. This will make
> shrink_active_list() consider the page not referenced and allow the
> anon_vma->lock to be dropped much quicker.
>
> The attached patch does just that, thoughts???
At first look,
- We have to fix this issue certenally.
- But your patch is a bit risky.
Your patch treat trylock(pte-lock) failure as no accessced. but
generally lock contention imply to have contention peer. iow, the page
have reference bit typically. then, next shrink_inactive_list() move it
active list again. that's suboptimal result.
However, we can't treat lock-contention as page-is-referenced simply. if it does,
the system easily go into OOM.
So,
if (priority < DEF_PRIORITY - 2)
page_referenced()
else
page_refenced_trylock()
is better?
On typical workload, almost vmscan only use DEF_PRIORITY. then,
if priority==DEF_PRIORITY situation don't cause heavy lock contention,
the system don't need to mind the contention. anyway we can't avoid
contention if the system have heavy memory pressure.
btw, current shrink_active_list() have unnecessary page_mapping_inuse() call.
it prevent to drop page reference bit from unmapped cache page. it mean
we protect unmapped cache page than mapped page. it is strange.
Unfortunately, I don't have enough development time today. I'll
working on tommorow.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists