linux-kernel - Re: [PATCH] vmscan: bail out of page reclaim after swap_cluster

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <492FCFF6.1050808@redhat.com>
Date:	Fri, 28 Nov 2008 06:03:18 -0500
From:	Rik van Riel <riel@...hat.com>
To:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
CC:	linux-mm@...ck.org, linux-kernel@...r.kernel.org, mel@....ul.ie,
	akpm@...ux-foundation.org
Subject: Re: [PATCH] vmscan: bail out of page reclaim after swap_cluster_max
 pages

KOSAKI Motohiro wrote:
> Hi
> 
> I mesured some data in this week and I got some interesting data.

> Rik patch improve about 30% and my patch improve 20% more.
> totally, We got about 50% improvement.

Very interesting indeed!   I did not know there was this easy
a reproducer of the problem that my patch is trying to solve.

> 	rc6+stream	rvr		+kosaki
> 	----------------------------------------
> avg	274.8004	194.6495	132.7923
> std	184.4902365	111.5699478	75.88299814
> min	37.523		57.084		52.233
> max	588.74		382.376		319.115

Impressive.

> So, rik patch and my patch improve perfectly different reclaim aspect.
> In general, kernel reclaim processing has several key goals.
> 
>  (1) if system has droppable cache, system shouldn't happen oom kill.
>  (2) if system has avaiable swap space, system shouldn't happen 
>      oom kill as poosible as.
>  (3) if system has enough free memory, system shouldn't reclaim any page
>      at all.
>  (4) if memory pressure is lite, system shouldn't cause heavy reclaim 
>      latency to application.
> 
> rik patch improve (3), my (this mail) modification improve (4).

Actually, to achieve (3) we would want to skip zones with way
more than enough free memory in shrink_zones().  Kswapd already
skips zones like this in shrink_pgdat(), so we definately want
this change:

@@ -1519,6 +1519,9 @@ static void shrink_zones(int priority, s
                         if (zone_is_all_unreclaimable(zone) &&
                                                 priority != DEF_PRIORITY)
                                 continue;       /* Let kswapd poll it */
+                       if (zone_watermark_ok(zone, order, 
4*zone->pages_high,
+                                               end_zone, 0))
+                               continue;       /* Lots free already */
                         sc->all_unreclaimable = 0;
                 } else {
                         /*

I'm sending a patch with this right now :)

> Actually, kswapd background reclaim and direct reclaim have perfectly
> different purpose and goal.
> 
> background reclaim
>   - kswapd don't need latency overhead reducing.
>     it isn't observe from end user.
>   - kswapd shoudn't only increase free pages, but also should 
>     zone balancing.
> 
> foreground reclaim
>   - it used application task context.
>   - as possible as, it shouldn't increase latency overhead.
>   - this reclaiming purpose is to make the memory for _own_ taks.
>     for other tasks memory don't need to concern.
>     kswap does it.

I am not entirely convinced that breaking out of the loop early
in a zone is not harmful for direct reclaimers.  Maybe it works
fine, maybe it won't.

Or maybe direct reclaimers should start scanning the largest zone
first, so your change can be done with the lowest risk possible?

Having said that, the 20% additional performance achieved with
your changes is impressive.

> o remove priority==DEF_PRIORITY condision

This one could definately be worth considering.

However, looking at the changeset that was backed out in the
early 2.6 series suggests that it may not be the best idea.

> o shrink_zones() also should have bailing out feature.

This one is similar.  What are the downsides of skipping a
zone entirely, when that zone has pages that should be freed?

If it can lead to the VM reclaiming new pages from one zone,
while leaving old pages from another zone in memory, we can
greatly reduce the caching efficiency of the page cache.

> ---
>  mm/vmscan.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> Index: b/mm/vmscan.c
> ===================================================================
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1469,7 +1469,7 @@ static void shrink_zone(int priority, st
>  		 * freeing target can get unreasonably large.
>  		 */
>  		if (sc->nr_reclaimed > sc->swap_cluster_max &&
> -		    priority < DEF_PRIORITY && !current_is_kswapd())
> +		    !current_is_kswapd())
>  			break;
>  	}
>  
> @@ -1534,6 +1534,8 @@ static void shrink_zones(int priority, s
>  		}
>  
>  		shrink_zone(priority, zone, sc);
> +		if (sc->nr_reclaimed > sc->swap_cluster_max)
> +			break;
>  	}
>  }

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/