lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue,  1 Dec 2009 21:23:23 +0900 (JST)
From:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To:	Larry Woodman <lwoodman@...hat.com>
Cc:	kosaki.motohiro@...fujitsu.com, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, akpm@...ux-foundation.org,
	Hugh Dickins <hugh.dickins@...cali.co.uk>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Rik van Riel <riel@...hat.com>,
	Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: [RFC] high system time & lock contention running large mixed workload

(cc to some related person)

> The cause was determined to be the unconditional call to
> page_referenced() for every mapped page encountered in
> shrink_active_list().  page_referenced() takes the anon_vma->lock and
> calls page_referenced_one() for each vma.  page_referenced_one() then
> calls page_check_address() which takes the pte_lockptr spinlock.   If
> several CPUs are doing this at the same time there is a lot of
> pte_lockptr spinlock contention with the anon_vma->lock held.  This
> causes contention on the anon_vma->lock, stalling in the fo and very
> high system time.
> 
> Before the splitLRU patch shrink_active_list() would only call
> page_referenced() when reclaim_mapped got set.  reclaim_mapped only got
> set when the priority worked its way from 12 all the way to 7. This
> prevented page_referenced() from being called from shrink_active_list()
> until the system was really struggling to reclaim memory.
> 
> On way to prevent this is to change page_check_address() to execute a
> spin_trylock(ptl) when it was called by shrink_active_list() and simply
> fail if it could not get the pte_lockptr spinlock.  This will make
> shrink_active_list() consider the page not referenced and allow the
> anon_vma->lock to be dropped much quicker.
> 
> The attached patch does just that, thoughts???

At first look,

   - We have to fix this issue certenally.
   - But your patch is a bit risky. 

Your patch treat trylock(pte-lock) failure as no accessced. but
generally lock contention imply to have contention peer. iow, the page
have reference bit typically. then, next shrink_inactive_list() move it
active list again. that's suboptimal result.

However, we can't treat lock-contention as page-is-referenced simply. if it does,
the system easily go into OOM.

So, 
	if (priority < DEF_PRIORITY - 2)
		page_referenced()
	else
		page_refenced_trylock()

is better?
On typical workload, almost vmscan only use DEF_PRIORITY. then,
if priority==DEF_PRIORITY situation don't cause heavy lock contention,
the system don't need to mind the contention. anyway we can't avoid
contention if the system have heavy memory pressure.

btw, current shrink_active_list() have unnecessary page_mapping_inuse() call.
it prevent to drop page reference bit from unmapped cache page. it mean
we protect unmapped cache page than mapped page. it is strange.

Unfortunately, I don't have enough development time today. I'll
working on tommorow.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ