lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4EB874C4.4010706@jp.fujitsu.com>
Date:	Mon, 07 Nov 2011 19:16:04 -0500
From:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To:	jweiner@...hat.com
CC:	khlebnikov@...allels.com, penberg@...nel.org, linux-mm@...ck.org,
	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
	fengguang.wu@...el.com, kamezawa.hiroyu@...fujitsu.com,
	hannes@...xchg.org, riel@...hat.com, mel@....ul.ie,
	minchan.kim@...il.com, gene.heskett@...il.com
Subject: Re: [rfc 1/3] mm: vmscan: never swap under low memory pressure

Hi,

Sorry for the delay. I had tripped San Jose in last week.


> On Wed, Nov 02, 2011 at 10:54:23AM -0700, KOSAKI Motohiro wrote:
>>> ---
>>>  mm/vmscan.c |    2 ++
>>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index a90c603..39d3da3 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -831,6 +831,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>>>  		 * Try to allocate it some swap space here.l
>>>  		 */
>>>  		if (PageAnon(page) && !PageSwapCache(page)) {
>>> +			if (priority >= DEF_PRIORITY - 2)
>>> +				goto keep_locked;
>>>  			if (!(sc->gfp_mask & __GFP_IO))
>>>  				goto keep_locked;for
>>>  			if (!add_to_swap(page))
>>
>> Hehe, i tried very similar way very long time ago. Unfortunately, it doesn't work.
>> "DEF_PRIORITY - 2" is really poor indicator for reclaim pressure. example, if the
>> machine have 1TB memory, DEF_PRIORITY-2 mean 1TB>>10 = 1GB. It't too big.
> 
> Do you remember what kind of tests you ran that demonstrated
> misbehaviour?
> 
> We can not reclaim anonymous pages without swapping, so the priority
> cutoff applies only to inactive file pages.  If you had 1TB of
> inactive file pages, the scanner would have to go through
> 
> 	((1 << (40 - 12)) >> 12) +
> 	((1 << (40 - 12)) >> 11) +
> 	((1 << (40 - 12)) >> 10) = 1792MB
> 
> without reclaiming SWAP_CLUSTER_MAX before it considers swapping.
> That's a lot of scanning but how likely is it that you have a TB of
> unreclaimable inactive cache pages?

I meant, the affect of this protection strongly depend on system memory.

 - system memory is plenty.
	the protection virtually affect to disable swap-out completely.
 - system memory is not plenty.
	the protection slightly makes a bonus to avoid swap out.

If people buy new machine and move-in their legacy workload into it, they
might surprise a lot of behavior change. I'm worry about it.

That's why I dislike DEF_PRIORITY based heuristic.


> Put into proportion, with a priority threshold of 10 a reclaimer will
> look at 0.17% ((n >> 12) + (n >> 11) + (n >> 10) (excluding the list
> balance bias) of inactive file pages without reclaiming
> SWAP_CLUSTER_MAX before it considers swapping.

Moreover, I think we need to make more precious analysis why unnecessary swapout
was happen. Which factor is dominant and when occur.


> Currently, the list balance biasing with each newly-added file page
> has much higher resistance to scan anonymous pages initially.  But
> once it shifted toward anon pages, all reclaimers will start swapping,
> unlike the priority threshold that each reclaimer has to reach
> individually.  Could this have been what was causing problems for you? 

Um. Currently number of fulusher threads are controlled by kernel. But,
number of swap-out threads aren't limited at all. So, our swapout often
works too aggressively. I think we need fix it.





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ