linux-kernel - Re: [PATCH v2] vmscan: limit concurrent reclaimers in shrink

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20091214210823.BBAE.A69D9226@jp.fujitsu.com>
Date:	Mon, 14 Dec 2009 21:22:24 +0900 (JST)
From:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To:	Rik van Riel <riel@...hat.com>
Cc:	kosaki.motohiro@...fujitsu.com, lwoodman@...hat.com,
	akpm@...ux-foundation.org, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, minchan.kim@...il.com
Subject: Re: [PATCH v2] vmscan: limit concurrent reclaimers in shrink_zone

> Under very heavy multi-process workloads, like AIM7, the VM can
> get into trouble in a variety of ways.  The trouble start when
> there are hundreds, or even thousands of processes active in the
> page reclaim code.
> 
> Not only can the system suffer enormous slowdowns because of
> lock contention (and conditional reschedules) between thousands
> of processes in the page reclaim code, but each process will try
> to free up to SWAP_CLUSTER_MAX pages, even when the system already
> has lots of memory free.
> 
> It should be possible to avoid both of those issues at once, by
> simply limiting how many processes are active in the page reclaim
> code simultaneously.
> 
> If too many processes are active doing page reclaim in one zone,
> simply go to sleep in shrink_zone().
> 
> On wakeup, check whether enough memory has been freed already
> before jumping into the page reclaim code ourselves.  We want
> to use the same threshold here that is used in the page allocator
> for deciding whether or not to call the page reclaim code in the
> first place, otherwise some unlucky processes could end up freeing
> memory for the rest of the system.

This patch seems very similar to my old effort. afaik, there are another
two benefit.

1. Improve resource gurantee
   if thousands tasks start to vmscan at the same time, they eat all memory for
   PF_MEMALLOC. it might cause another dangerous problem. some filesystem
   and io device don't handle allocation failure properly.

2. Improve OOM contidion behavior
   Currently, vmscan don't handle SIGKILL at all. then if the system
   have hevy memory pressure, OOM killer can't kill the target process
   soon. it might cause OOM killer kill next innocent process.
   This patch can fix it.



> @@ -1600,6 +1612,31 @@ static void shrink_zone(int priority, struct zone *zone,
>  	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
>  	int noswap = 0;
>  
> +	if (!current_is_kswapd() && atomic_read(&zone->concurrent_reclaimers) >
> +				max_zone_concurrent_reclaimers &&
> +				(sc->gfp_mask & (__GFP_IO|__GFP_FS)) ==
> +				(__GFP_IO|__GFP_FS)) {
> +		/*
> +		 * Do not add to the lock contention if this zone has
> +		 * enough processes doing page reclaim already, since
> +		 * we would just make things slower.
> +		 */
> +		sleep_on(&zone->reclaim_wait);

Oops. this is bug. sleep_on() is not SMP safe.


I made few fixing patch today. I'll post it soon.


btw, following is mesurement result by hackbench.
================

unit: sec

parameter			old		new
130 (5200 processes)		5.463		4.442
140 (5600 processes)		479.357		7.792
150 (6000 processes)		729.640		20.529



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/