lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110815064359.GD26978@dastard>
Date:	Mon, 15 Aug 2011 16:43:59 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Glauber Costa <glommer@...allels.com>
Cc:	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	containers@...ts.linux-foundation.org,
	Pavel Emelyanov <xemul@...allels.com>,
	Al Viro <viro@...iv.linux.org.uk>,
	Hugh Dickins <hughd@...gle.com>,
	Nick Piggin <npiggin@...nel.dk>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Dave Hansen <dave@...ux.vnet.ibm.com>,
	James Bottomley <JBottomley@...allels.com>,
	Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [PATCH v3 1/4] factor out single-shrinker code

On Sun, Aug 14, 2011 at 07:13:49PM +0400, Glauber Costa wrote:
> While shrinking our caches, vmscan.c passes through all
> registered shrinkers, trying to free objects as it goes.
> 
> We would like to do that individually for some caches,
> like the dcache, when certain conditions apply (for
> example, when we reach a soon-to-exist maximum allowed size)
> 
> To avoid re-writing the same logic at more than one place,
> this patch factors out the shrink logic at shrink_one_shrinker(),
> that we can call from other places of the kernel.
> 
> Signed-off-by: Glauber Costa <glommer@...allels.com>
> CC: Dave Chinner <david@...morbit.com>
> CC: Eric Dumazet <eric.dumazet@...il.com>
> ---
>  include/linux/shrinker.h |    6 ++
>  mm/vmscan.c              |  185 ++++++++++++++++++++++++----------------------
>  2 files changed, 104 insertions(+), 87 deletions(-)
> 
> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> index 790651b..c5db650 100644
> --- a/include/linux/shrinker.h
> +++ b/include/linux/shrinker.h
> @@ -39,4 +39,10 @@ struct shrinker {
>  #define DEFAULT_SEEKS 2 /* A good number if you don't know better. */
>  extern void register_shrinker(struct shrinker *);
>  extern void unregister_shrinker(struct shrinker *);
> +
> +unsigned long shrink_one_shrinker(struct shrinker *shrinker,
> +				  struct shrink_control *shrink,
> +				  unsigned long nr_pages_scanned,                      
> +				  unsigned long lru_pages);
> +
>  #endif
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 7ef6912..50dfc61 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -211,6 +211,102 @@ static inline int do_shrinker_shrink(struct shrinker *shrinker,
>  }
>  
>  #define SHRINK_BATCH 128
> +unsigned long shrink_one_shrinker(struct shrinker *shrinker,
> +				  struct shrink_control *shrink,
> +				  unsigned long nr_pages_scanned,
> +				  unsigned long lru_pages)
> +{

This isn't the right place to cut the code apart. The act of
shrinking a cache is separate to the act of calculating how much to
shrink. you're taking the vmscan interface and trying to hack around
that when direct calls to a single shrinker is needed.....

> +	unsigned long ret = 0;
> +	unsigned long long delta;
> +	unsigned long total_scan;
> +	unsigned long max_pass;
> +	int shrink_ret = 0;
> +	long nr;
> +	long new_nr;
> +	long batch_size = shrinker->batch ? shrinker->batch
> +					  : SHRINK_BATCH;
> +
> +	/*
> +	 * copy the current shrinker scan count into a local variable
> +	 * and zero it so that other concurrent shrinker invocations
> +	 * don't also do this scanning work.
> +	 */
> +	do {
> +		nr = shrinker->nr;
> +	} while (cmpxchg(&shrinker->nr, nr, 0) != nr);
> +
> +	total_scan = nr;
> +	max_pass = do_shrinker_shrink(shrinker, shrink, 0);
> +	delta = (4 * nr_pages_scanned) / shrinker->seeks;
> +	delta *= max_pass;
> +	do_div(delta, lru_pages + 1);
> +	total_scan += delta;
> +	if (total_scan < 0) {
> +		printk(KERN_ERR "shrink_slab: %pF negative objects to "
> +		       "delete nr=%ld\n",
> +		       shrinker->shrink, total_scan);
> +		total_scan = max_pass;
> +	}
> +
> +	/*
> +	 * We need to avoid excessive windup on filesystem shrinkers
> +	 * due to large numbers of GFP_NOFS allocations causing the
> +	 * shrinkers to return -1 all the time. This results in a large
> +	 * nr being built up so when a shrink that can do some work
> +	 * comes along it empties the entire cache due to nr >>>
> +	 * max_pass.  This is bad for sustaining a working set in
> +	 * memory.
> +	 *
> +	 * Hence only allow the shrinker to scan the entire cache when
> +	 * a large delta change is calculated directly.
> +	 */
> +	if (delta < max_pass / 4)
> +		total_scan = min(total_scan, max_pass / 2);
> +
> +	/*
> +	 * Avoid risking looping forever due to too large nr value:
> +	 * never try to free more than twice the estimate number of
> +	 * freeable entries.
> +	 */
> +	if (total_scan > max_pass * 2)
> +		total_scan = max_pass * 2;
> +
> +	trace_mm_shrink_slab_start(shrinker, shrink, nr,
> +				nr_pages_scanned, lru_pages,
> +				max_pass, delta, total_scan);
> +

This bit from here:

> +	while (total_scan >= batch_size) {
> +		int nr_before;
> +
> +		nr_before = do_shrinker_shrink(shrinker, shrink, 0);
> +		shrink_ret = do_shrinker_shrink(shrinker, shrink,
> +						batch_size);
> +		if (shrink_ret == -1)
> +			break;
> +		if (shrink_ret < nr_before)
> +			ret += nr_before - shrink_ret;
> +		count_vm_events(SLABS_SCANNED, batch_size);
> +		total_scan -= batch_size;
> +
> +		cond_resched();
> +	}

to here is the only common code. It takes a locked shrinker and a
set up scan_control structure. As it is, I've got a set of patches
that I'm working on that fix the garbage srhinker API and abstract
this section of the code correctly for external users. I'll try to
get that finished for review later this week.

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ