lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <21fc45ea-b1c2-d274-9faf-428931347ca1@suse.cz>
Date:   Tue, 19 Oct 2021 10:48:55 +0200
From:   Vlastimil Babka <vbabka@...e.cz>
To:     linux-mm@...ck.org, Christoph Lameter <cl@...ux.com>,
        David Rientjes <rientjes@...gle.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Pekka Enberg <penberg@...nel.org>,
        Jann Horn <jannh@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     linux-kernel@...r.kernel.org, Roman Gushchin <guro@...com>
Subject: Re: [PATCH v2 1/2] mm, slub: change percpu partial accounting from
 objects to pages

Hm looks like I forgot To: Andrew

some kind of feedback from fellow slab maintainers also wouldn't hurt :)

Thanks.

On 10/12/21 15:46, Vlastimil Babka wrote:
> With CONFIG_SLUB_CPU_PARTIAL enabled, SLUB keeps a percpu list of partial
> slabs that can be promoted to cpu slab when the previous one is depleted,
> without accessing the shared partial list. A slab can be added to this list
> by 1) refill of an empty list from get_partial_node() - once we really have to
> access the shared partial list, we acquire multiple slabs to amortize the cost
> of locking, and 2) first free to a previously full slab - instead of putting
> the slab on a shared partial list, we can more cheaply freeze it and put it on
> the per-cpu list.
> 
> To control how large a percpu partial list can grow for a kmem cache,
> set_cpu_partial() calculates a target number of free objects on each cpu's
> percpu partial list, and this can be also set by the sysfs file cpu_partial.
> 
> However, the tracking of actual number of objects is imprecise, in order to
> limit overhead from cpu X freeing an objects to a slab on percpu partial
> list of cpu Y. Basically, the percpu partial slabs form a single linked list,
> and when we add a new slab to the list with current head "oldpage", we set in
> the struct page of the slab we're adding:
> 
> page->pages = oldpage->pages + 1; // this is precise
> page->pobjects = oldpage->pobjects + (page->objects - page->inuse);
> page->next = oldpage;
> 
> Thus the real number of free objects in the slab (objects - inuse) is only
> determined at the moment of adding the slab to the percpu partial list, and
> further freeing doesn't update the pobjects counter nor propagate it to the
> current list head. As Jann reports [1], this can easily lead to large
> inaccuracies, where the target number of objects (up to 30 by default) can
> translate to the same number of (empty) slab pages on the list. In case 2)
> above, we put a slab with 1 free object on the list, thus only increase
> page->pobjects by 1, even if there are subsequent frees on the same slab. Jann
> has noticed this in practice and so did we [2] when investigating significant
> increase of kmemcg usage after switching from SLAB to SLUB.
> 
> While this is no longer a problem in kmemcg context thanks to the accounting
> rewrite in 5.9, the memory waste is still not ideal and it's questionable
> whether it makes sense to perform free object count based control when object
> counts can easily become so much inaccurate. So this patch converts the
> accounting to be based on number of pages only (which is precise) and removes
> the page->pobjects field completely. This is also ultimately simpler.
> 
> To retain the existing set_cpu_partial() heuristic, first calculate the target
> number of objects as previously, but then convert it to target number of pages
> by assuming the pages will be half-filled on average. This assumption might
> obviously also be inaccurate in practice, but cannot degrade to actual number of
> pages being equal to the target number of objects.
> 
> We could also skip the intermediate step with target number of objects and
> rewrite the heuristic in terms of pages. However we still have the sysfs file
> cpu_partial which uses number of objects and could break existing users if it
> suddenly becomes number of pages, so this patch doesn't do that.
> 
> In practice, after this patch the heuristics limit the size of percpu partial
> list up to 2 pages. In case of a reported regression (which would mean some
> workload has benefited from the previous imprecise object based counting), we
> can tune the heuristics to get a better compromise within the new scheme, while
> still avoid the unexpectedly long percpu partial lists.
> 
> [1] https://lore.kernel.org/linux-mm/CAG48ez2Qx5K1Cab-m8BdSibp6wLTip6ro4=-umR7BLsEgjEYzA@mail.gmail.com/
> [2] https://lore.kernel.org/all/2f0f46e8-2535-410a-1859-e9cfa4e57c18@suse.cz/
> 
> ==========
> Evaluation
> ==========
> 
> Mel was kind enough to run v1 through mmtests machinery for netperf (localhost)
> and hackbench and, for most significant results see below. So there are some
> apparent regressions, especially with hackbench, which I think ultimately boils
> down to having shorter percpu partial lists on average and some benchmarks
> benefiting from longer ones. Monitoring slab usage also indicated less memory
> usage by slab. Based on that, the following patch will bump the defaults to
> allow longer percpu partial lists than after this patch.
> 
> However the goal is certainly not such that we would limit the percpu partial
> lists to 30 pages just because previously a specific alloc/free pattern could
> lead to the limit of 30 objects translate to a limit to 30 pages - that would
> make little sense. This is a correctness patch, and if a workload benefits from
> larger lists, the sysfs tuning knobs are still there to allow that.
> 
> Netperf
> 
> 2-socket Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz (20 cores, 40 threads
> per socket), 384GB RAM
> TCP-RR:
> hmean before 127045.79 after 121092.94 (-4.69%, worse)
> stddev before  2634.37 after   1254.08
> UDP-RR:
> hmean before 166985.45 after 160668.94 ( -3.78%, worse)
> stddev before 4059.69 after 1943.63
> 
> 2-socket Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz (20 cores, 40 threads
> per socket), 512GB RAM
> TCP-RR:
> hmean before 84173.25 after 76914.72 ( -8.62%, worse)
> UDP-RR:
> hmean before 93571.12 after 96428.69 ( 3.05%, better)
> stddev before 23118.54 after 16828.14
> 
> 2-socket Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz (12 cores, 24 threads
> per socket), 64GB RAM
> TCP-RR:
> hmean before 49984.92 after 48922.27 ( -2.13%, worse)
> stddev before 6248.15 after 4740.51
> UDP-RR:
> hmean before 61854.31 after 68761.81 ( 11.17%, better)
> stddev before 4093.54 after 5898.91
> 
> other machines - within 2%
> 
> Hackbench
> 
> (results before and after the patch, negative % means worse)
> 
> 2-socket AMD EPYC 7713 (64 cores, 128 threads per core), 256GB RAM
> hackbench-process-sockets
> Amean 	1 	0.5380	0.5583	( -3.78%)
> Amean 	4 	0.7510	0.8150	( -8.52%)
> Amean 	7 	0.7930	0.9533	( -20.22%)
> Amean 	12 	0.7853	1.1313	( -44.06%)
> Amean 	21 	1.1520	1.4993	( -30.15%)
> Amean 	30 	1.6223	1.9237	( -18.57%)
> Amean 	48 	2.6767	2.9903	( -11.72%)
> Amean 	79 	4.0257	5.1150	( -27.06%)
> Amean 	110	5.5193	7.4720	( -35.38%)
> Amean 	141	7.2207	9.9840	( -38.27%)
> Amean 	172	8.4770	12.1963	( -43.88%)
> Amean 	203	9.6473	14.3137	( -48.37%)
> Amean 	234	11.3960	18.7917	( -64.90%)
> Amean 	265	13.9627	22.4607	( -60.86%)
> Amean 	296	14.9163	26.0483	( -74.63%)
> 
> hackbench-thread-sockets
> Amean 	1 	0.5597	0.5877	( -5.00%)
> Amean 	4 	0.7913	0.8960	( -13.23%)
> Amean 	7 	0.8190	1.0017	( -22.30%)
> Amean 	12 	0.9560	1.1727	( -22.66%)
> Amean 	21 	1.7587	1.5660	( 10.96%)
> Amean 	30 	2.4477	1.9807	( 19.08%)
> Amean 	48 	3.4573	3.0630	( 11.41%)
> Amean 	79 	4.7903	5.1733	( -8.00%)
> Amean 	110	6.1370	7.4220	( -20.94%)
> Amean 	141	7.5777	9.2617	( -22.22%)
> Amean 	172	9.2280	11.0907	( -20.18%)
> Amean 	203	10.2793	13.3470	( -29.84%)
> Amean 	234	11.2410	17.1070	( -52.18%)
> Amean 	265	12.5970	23.3323	( -85.22%)
> Amean 	296	17.1540	24.2857	( -41.57%)
> 
> 2-socket Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz (20 cores, 40 threads
> per socket), 384GB RAM
> hackbench-process-sockets
> Amean 	1 	0.5760	0.4793	( 16.78%)
> Amean 	4 	0.9430	0.9707	( -2.93%)
> Amean 	7 	1.5517	1.8843	( -21.44%)
> Amean 	12 	2.4903	2.7267	( -9.49%)
> Amean 	21 	3.9560	4.2877	( -8.38%)
> Amean 	30 	5.4613	5.8343	( -6.83%)
> Amean 	48 	8.5337	9.2937	( -8.91%)
> Amean 	79 	14.0670	15.2630	( -8.50%)
> Amean 	110	19.2253	21.2467	( -10.51%)
> Amean 	141	23.7557	25.8550	( -8.84%)
> Amean 	172	28.4407	29.7603	( -4.64%)
> Amean 	203	33.3407	33.9927	( -1.96%)
> Amean 	234	38.3633	39.1150	( -1.96%)
> Amean 	265	43.4420	43.8470	( -0.93%)
> Amean 	296	48.3680	48.9300	( -1.16%)
> 
> hackbench-thread-sockets
> Amean 	1 	0.6080	0.6493	( -6.80%)
> Amean 	4 	1.0000	1.0513	( -5.13%)
> Amean 	7 	1.6607	2.0260	( -22.00%)
> Amean 	12 	2.7637	2.9273	( -5.92%)
> Amean 	21 	5.0613	4.5153	( 10.79%)
> Amean 	30 	6.3340	6.1140	( 3.47%)
> Amean 	48 	9.0567	9.5577	( -5.53%)
> Amean 	79 	14.5657	15.7983	( -8.46%)
> Amean 	110	19.6213	21.6333	( -10.25%)
> Amean 	141	24.1563	26.2697	( -8.75%)
> Amean 	172	28.9687	30.2187	( -4.32%)
> Amean 	203	33.9763	34.6970	( -2.12%)
> Amean 	234	38.8647	39.3207	( -1.17%)
> Amean 	265	44.0813	44.1507	( -0.16%)
> Amean 	296	49.2040	49.4330	( -0.47%)
> 
> 2-socket Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz (20 cores, 40 threads
> per socket), 512GB RAM
> hackbench-process-sockets
> Amean 	1 	0.5027	0.5017	( 0.20%)
> Amean 	4 	1.1053	1.2033	( -8.87%)
> Amean 	7 	1.8760	2.1820	( -16.31%)
> Amean 	12 	2.9053	3.1810	( -9.49%)
> Amean 	21 	4.6777	4.9920	( -6.72%)
> Amean 	30 	6.5180	6.7827	( -4.06%)
> Amean 	48 	10.0710	10.5227	( -4.48%)
> Amean 	79 	16.4250	17.5053	( -6.58%)
> Amean 	110	22.6203	24.4617	( -8.14%)
> Amean 	141	28.0967	31.0363	( -10.46%)
> Amean 	172	34.4030	36.9233	( -7.33%)
> Amean 	203	40.5933	43.0850	( -6.14%)
> Amean 	234	46.6477	48.7220	( -4.45%)
> Amean 	265	53.0530	53.9597	( -1.71%)
> Amean 	296	59.2760	59.9213	( -1.09%)
> 
> hackbench-thread-sockets
> Amean 	1 	0.5363	0.5330	( 0.62%)
> Amean 	4 	1.1647	1.2157	( -4.38%)
> Amean 	7 	1.9237	2.2833	( -18.70%)
> Amean 	12 	2.9943	3.3110	( -10.58%)
> Amean 	21 	4.9987	5.1880	( -3.79%)
> Amean 	30 	6.7583	7.0043	( -3.64%)
> Amean 	48 	10.4547	10.8353	( -3.64%)
> Amean 	79 	16.6707	17.6790	( -6.05%)
> Amean 	110	22.8207	24.4403	( -7.10%)
> Amean 	141	28.7090	31.0533	( -8.17%)
> Amean 	172	34.9387	36.8260	( -5.40%)
> Amean 	203	41.1567	43.0450	( -4.59%)
> Amean 	234	47.3790	48.5307	( -2.43%)
> Amean 	265	53.9543	54.6987	( -1.38%)
> Amean 	296	60.0820	60.2163	( -0.22%)
> 
> 1-socket Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz (4 cores, 8 threads),
> 32 GB RAM
> hackbench-process-sockets
> Amean 	1 	1.4760	1.5773	( -6.87%)
> Amean 	3 	3.9370	4.0910	( -3.91%)
> Amean 	5 	6.6797	6.9357	( -3.83%)
> Amean 	7 	9.3367	9.7150	( -4.05%)
> Amean 	12	15.7627	16.1400	( -2.39%)
> Amean 	18	23.5360	23.6890	( -0.65%)
> Amean 	24	31.0663	31.3137	( -0.80%)
> Amean 	30	38.7283	39.0037	( -0.71%)
> Amean 	32	41.3417	41.6097	( -0.65%)
> 
> hackbench-thread-sockets
> Amean 	1 	1.5250	1.6043	( -5.20%)
> Amean 	3 	4.0897	4.2603	( -4.17%)
> Amean 	5 	6.7760	7.0933	( -4.68%)
> Amean 	7 	9.4817	9.9157	( -4.58%)
> Amean 	12	15.9610	16.3937	( -2.71%)
> Amean 	18	23.9543	24.3417	( -1.62%)
> Amean 	24	31.4400	31.7217	( -0.90%)
> Amean 	30	39.2457	39.5467	( -0.77%)
> Amean 	32	41.8267	42.1230	( -0.71%)
> 
> 2-socket Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz (12 cores, 24 threads
> per socket), 64GB RAM
> hackbench-process-sockets
> Amean 	1 	1.0347	1.0880	( -5.15%)
> Amean 	4 	1.7267	1.8527	( -7.30%)
> Amean 	7 	2.6707	2.8110	( -5.25%)
> Amean 	12 	4.1617	4.3383	( -4.25%)
> Amean 	21 	7.0070	7.2600	( -3.61%)
> Amean 	30 	9.9187	10.2397	( -3.24%)
> Amean 	48 	15.6710	16.3923	( -4.60%)
> Amean 	79 	24.7743	26.1247	( -5.45%)
> Amean 	110	34.3000	35.9307	( -4.75%)
> Amean 	141	44.2043	44.8010	( -1.35%)
> Amean 	172	54.2430	54.7260	( -0.89%)
> Amean 	192	60.6557	60.9777	( -0.53%)
> 
> hackbench-thread-sockets
> Amean 	1 	1.0610	1.1353	( -7.01%)
> Amean 	4 	1.7543	1.9140	( -9.10%)
> Amean 	7 	2.7840	2.9573	( -6.23%)
> Amean 	12 	4.3813	4.4937	( -2.56%)
> Amean 	21 	7.3460	7.5350	( -2.57%)
> Amean 	30 	10.2313	10.5190	( -2.81%)
> Amean 	48 	15.9700	16.5940	( -3.91%)
> Amean 	79 	25.3973	26.6637	( -4.99%)
> Amean 	110	35.1087	36.4797	( -3.91%)
> Amean 	141	45.8220	46.3053	( -1.05%)
> Amean 	172	55.4917	55.7320	( -0.43%)
> Amean 	192	62.7490	62.5410	( 0.33%)
> 
> Reported-by: Jann Horn <jannh@...gle.com>
> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
> ---
> Changes in v2:
> - added evaluation results to changelog
> - added patch 2 bumping the defaults
>  include/linux/mm_types.h |  2 -
>  include/linux/slub_def.h | 13 +-----
>  mm/slub.c                | 89 ++++++++++++++++++++++++++--------------
>  3 files changed, 61 insertions(+), 43 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 7f8ee09c711f..68ffa064b7a8 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -124,10 +124,8 @@ struct page {
>  					struct page *next;
>  #ifdef CONFIG_64BIT
>  					int pages;	/* Nr of pages left */
> -					int pobjects;	/* Approximate count */
>  #else
>  					short int pages;
> -					short int pobjects;
>  #endif
>  				};
>  			};
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index 85499f0586b0..0fa751b946fa 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -99,6 +99,8 @@ struct kmem_cache {
>  #ifdef CONFIG_SLUB_CPU_PARTIAL
>  	/* Number of per cpu partial objects to keep around */
>  	unsigned int cpu_partial;
> +	/* Number of per cpu partial pages to keep around */
> +	unsigned int cpu_partial_pages;
>  #endif
>  	struct kmem_cache_order_objects oo;
>  
> @@ -141,17 +143,6 @@ struct kmem_cache {
>  	struct kmem_cache_node *node[MAX_NUMNODES];
>  };
>  
> -#ifdef CONFIG_SLUB_CPU_PARTIAL
> -#define slub_cpu_partial(s)		((s)->cpu_partial)
> -#define slub_set_cpu_partial(s, n)		\
> -({						\
> -	slub_cpu_partial(s) = (n);		\
> -})
> -#else
> -#define slub_cpu_partial(s)		(0)
> -#define slub_set_cpu_partial(s, n)
> -#endif /* CONFIG_SLUB_CPU_PARTIAL */
> -
>  #ifdef CONFIG_SYSFS
>  #define SLAB_SUPPORTS_SYSFS
>  void sysfs_slab_unlink(struct kmem_cache *);
> diff --git a/mm/slub.c b/mm/slub.c
> index 3d2025f7163b..3757f31c5d97 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -414,6 +414,29 @@ static inline unsigned int oo_objects(struct kmem_cache_order_objects x)
>  	return x.x & OO_MASK;
>  }
>  
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> +static void slub_set_cpu_partial(struct kmem_cache *s, unsigned int nr_objects)
> +{
> +	unsigned int nr_pages;
> +
> +	s->cpu_partial = nr_objects;
> +
> +	/*
> +	 * We take the number of objects but actually limit the number of
> +	 * pages on the per cpu partial list, in order to limit excessive
> +	 * growth of the list. For simplicity we assume that the pages will
> +	 * be half-full.
> +	 */
> +	nr_pages = DIV_ROUND_UP(nr_objects * 2, oo_objects(s->oo));
> +	s->cpu_partial_pages = nr_pages;
> +}
> +#else
> +static inline void
> +slub_set_cpu_partial(struct kmem_cache *s, unsigned int nr_objects)
> +{
> +}
> +#endif /* CONFIG_SLUB_CPU_PARTIAL */
> +
>  /*
>   * Per slab locking using the pagelock
>   */
> @@ -2045,7 +2068,7 @@ static inline void remove_partial(struct kmem_cache_node *n,
>   */
>  static inline void *acquire_slab(struct kmem_cache *s,
>  		struct kmem_cache_node *n, struct page *page,
> -		int mode, int *objects)
> +		int mode)
>  {
>  	void *freelist;
>  	unsigned long counters;
> @@ -2061,7 +2084,6 @@ static inline void *acquire_slab(struct kmem_cache *s,
>  	freelist = page->freelist;
>  	counters = page->counters;
>  	new.counters = counters;
> -	*objects = new.objects - new.inuse;
>  	if (mode) {
>  		new.inuse = page->objects;
>  		new.freelist = NULL;
> @@ -2099,9 +2121,8 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
>  {
>  	struct page *page, *page2;
>  	void *object = NULL;
> -	unsigned int available = 0;
>  	unsigned long flags;
> -	int objects;
> +	unsigned int partial_pages = 0;
>  
>  	/*
>  	 * Racy check. If we mistakenly see no partial slabs then we
> @@ -2119,11 +2140,10 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
>  		if (!pfmemalloc_match(page, gfpflags))
>  			continue;
>  
> -		t = acquire_slab(s, n, page, object == NULL, &objects);
> +		t = acquire_slab(s, n, page, object == NULL);
>  		if (!t)
>  			break;
>  
> -		available += objects;
>  		if (!object) {
>  			*ret_page = page;
>  			stat(s, ALLOC_FROM_PARTIAL);
> @@ -2131,10 +2151,15 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
>  		} else {
>  			put_cpu_partial(s, page, 0);
>  			stat(s, CPU_PARTIAL_NODE);
> +			partial_pages++;
>  		}
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  		if (!kmem_cache_has_cpu_partial(s)
> -			|| available > slub_cpu_partial(s) / 2)
> +			|| partial_pages > s->cpu_partial_pages / 2)
>  			break;
> +#else
> +		break;
> +#endif
>  
>  	}
>  	spin_unlock_irqrestore(&n->list_lock, flags);
> @@ -2539,14 +2564,13 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
>  	struct page *page_to_unfreeze = NULL;
>  	unsigned long flags;
>  	int pages = 0;
> -	int pobjects = 0;
>  
>  	local_lock_irqsave(&s->cpu_slab->lock, flags);
>  
>  	oldpage = this_cpu_read(s->cpu_slab->partial);
>  
>  	if (oldpage) {
> -		if (drain && oldpage->pobjects > slub_cpu_partial(s)) {
> +		if (drain && oldpage->pages >= s->cpu_partial_pages) {
>  			/*
>  			 * Partial array is full. Move the existing set to the
>  			 * per node partial list. Postpone the actual unfreezing
> @@ -2555,16 +2579,13 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
>  			page_to_unfreeze = oldpage;
>  			oldpage = NULL;
>  		} else {
> -			pobjects = oldpage->pobjects;
>  			pages = oldpage->pages;
>  		}
>  	}
>  
>  	pages++;
> -	pobjects += page->objects - page->inuse;
>  
>  	page->pages = pages;
> -	page->pobjects = pobjects;
>  	page->next = oldpage;
>  
>  	this_cpu_write(s->cpu_slab->partial, page);
> @@ -3980,6 +4001,8 @@ static void set_min_partial(struct kmem_cache *s, unsigned long min)
>  static void set_cpu_partial(struct kmem_cache *s)
>  {
>  #ifdef CONFIG_SLUB_CPU_PARTIAL
> +	unsigned int nr_objects;
> +
>  	/*
>  	 * cpu_partial determined the maximum number of objects kept in the
>  	 * per cpu partial lists of a processor.
> @@ -3989,24 +4012,22 @@ static void set_cpu_partial(struct kmem_cache *s)
>  	 * filled up again with minimal effort. The slab will never hit the
>  	 * per node partial lists and therefore no locking will be required.
>  	 *
> -	 * This setting also determines
> -	 *
> -	 * A) The number of objects from per cpu partial slabs dumped to the
> -	 *    per node list when we reach the limit.
> -	 * B) The number of objects in cpu partial slabs to extract from the
> -	 *    per node list when we run out of per cpu objects. We only fetch
> -	 *    50% to keep some capacity around for frees.
> +	 * For backwards compatibility reasons, this is determined as number
> +	 * of objects, even though we now limit maximum number of pages, see
> +	 * slub_set_cpu_partial()
>  	 */
>  	if (!kmem_cache_has_cpu_partial(s))
> -		slub_set_cpu_partial(s, 0);
> +		nr_objects = 0;
>  	else if (s->size >= PAGE_SIZE)
> -		slub_set_cpu_partial(s, 2);
> +		nr_objects = 2;
>  	else if (s->size >= 1024)
> -		slub_set_cpu_partial(s, 6);
> +		nr_objects = 6;
>  	else if (s->size >= 256)
> -		slub_set_cpu_partial(s, 13);
> +		nr_objects = 13;
>  	else
> -		slub_set_cpu_partial(s, 30);
> +		nr_objects = 30;
> +
> +	slub_set_cpu_partial(s, nr_objects);
>  #endif
>  }
>  
> @@ -5379,7 +5400,12 @@ SLAB_ATTR(min_partial);
>  
>  static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
> -	return sysfs_emit(buf, "%u\n", slub_cpu_partial(s));
> +	unsigned int nr_partial = 0;
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> +	nr_partial = s->cpu_partial;
> +#endif
> +
> +	return sysfs_emit(buf, "%u\n", nr_partial);
>  }
>  
>  static ssize_t cpu_partial_store(struct kmem_cache *s, const char *buf,
> @@ -5450,12 +5476,12 @@ static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf)
>  
>  		page = slub_percpu_partial(per_cpu_ptr(s->cpu_slab, cpu));
>  
> -		if (page) {
> +		if (page)
>  			pages += page->pages;
> -			objects += page->pobjects;
> -		}
>  	}
>  
> +	/* Approximate half-full pages , see slub_set_cpu_partial() */
> +	objects = (pages * oo_objects(s->oo)) / 2;
>  	len += sysfs_emit_at(buf, len, "%d(%d)", objects, pages);
>  
>  #ifdef CONFIG_SMP
> @@ -5463,9 +5489,12 @@ static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf)
>  		struct page *page;
>  
>  		page = slub_percpu_partial(per_cpu_ptr(s->cpu_slab, cpu));
> -		if (page)
> +		if (page) {
> +			pages = READ_ONCE(page->pages);
> +			objects = (pages * oo_objects(s->oo)) / 2;
>  			len += sysfs_emit_at(buf, len, " C%d=%d(%d)",
> -					     cpu, page->pobjects, page->pages);
> +					     cpu, objects, pages);
> +		}
>  	}
>  #endif
>  	len += sysfs_emit_at(buf, len, "\n");
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ