linux-kernel - Re: [PATCH v2 3/7] memcg: reduce memory for the lruvec and memcg stats

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zi_EEOUS_iCh2Nfh@P9FQF9L96D>
Date: Mon, 29 Apr 2024 09:00:16 -0700
From: Roman Gushchin <roman.gushchin@...ux.dev>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Michal Hocko <mhocko@...nel.org>,
	Muchun Song <muchun.song@...ux.dev>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 3/7] memcg: reduce memory for the lruvec and memcg
 stats

On Fri, Apr 26, 2024 at 05:37:29PM -0700, Shakeel Butt wrote:
> At the moment, the amount of memory allocated for stats related structs
> in the mem_cgroup corresponds to the size of enum node_stat_item.
> However not all fields in enum node_stat_item has corresponding memcg
> stats. So, let's use indirection mechanism similar to the one used for
> memcg vmstats management.
> 
> For a given x86_64 config, the size of stats with and without patch is:
> 
> structs size in bytes         w/o     with
> 
> struct lruvec_stats           1128     648
> struct lruvec_stats_percpu     752     432
> struct memcg_vmstats          1832    1352
> struct memcg_vmstats_percpu   1280     960
> 
> The memory savings is further compounded by the fact that these structs
> are allocated for each cpu and for each node. To be precise, for each
> memcg the memory saved would be:
> 
> Memory saved = ((21 * 3 * NR_NODES) + (21 * 2 * NR_NODS * NR_CPUS) +
> 	       (21 * 3) + (21 * 2 * NR_CPUS)) * sizeof(long)
> 
> Where 21 is the number of fields eliminated.

Nice savings!

> 
> Signed-off-by: Shakeel Butt <shakeel.butt@...ux.dev>
> ---
>  mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 115 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5e337ed6c6bf..c164bc9b8ed6 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -576,35 +576,105 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz)
>  	return mz;
>  }
>  
> +/* Subset of node_stat_item for memcg stats */
> +static const unsigned int memcg_node_stat_items[] = {
> +	NR_INACTIVE_ANON,
> +	NR_ACTIVE_ANON,
> +	NR_INACTIVE_FILE,
> +	NR_ACTIVE_FILE,
> +	NR_UNEVICTABLE,
> +	NR_SLAB_RECLAIMABLE_B,
> +	NR_SLAB_UNRECLAIMABLE_B,
> +	WORKINGSET_REFAULT_ANON,
> +	WORKINGSET_REFAULT_FILE,
> +	WORKINGSET_ACTIVATE_ANON,
> +	WORKINGSET_ACTIVATE_FILE,
> +	WORKINGSET_RESTORE_ANON,
> +	WORKINGSET_RESTORE_FILE,
> +	WORKINGSET_NODERECLAIM,
> +	NR_ANON_MAPPED,
> +	NR_FILE_MAPPED,
> +	NR_FILE_PAGES,
> +	NR_FILE_DIRTY,
> +	NR_WRITEBACK,
> +	NR_SHMEM,
> +	NR_SHMEM_THPS,
> +	NR_FILE_THPS,
> +	NR_ANON_THPS,
> +	NR_KERNEL_STACK_KB,
> +	NR_PAGETABLE,
> +	NR_SECONDARY_PAGETABLE,
> +#ifdef CONFIG_SWAP
> +	NR_SWAPCACHE,
> +#endif
> +};
> +
> +static const unsigned int memcg_stat_items[] = {
> +	MEMCG_SWAP,
> +	MEMCG_SOCK,
> +	MEMCG_PERCPU_B,
> +	MEMCG_VMALLOC,
> +	MEMCG_KMEM,
> +	MEMCG_ZSWAP_B,
> +	MEMCG_ZSWAPPED,
> +};
> +
> +#define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items)
> +#define NR_MEMCG_STATS (NR_MEMCG_NODE_STAT_ITEMS + ARRAY_SIZE(memcg_stat_items))
> +static int8_t mem_cgroup_stats_index[MEMCG_NR_STAT] __read_mostly;
> +
> +static void init_memcg_stats(void)
> +{
> +	int8_t i, j = 0;
> +
> +	/* Switch to short once this failure occurs. */
> +	BUILD_BUG_ON(NR_MEMCG_STATS >= 127 /* INT8_MAX */);
> +
> +	for (i = 0; i < NR_MEMCG_NODE_STAT_ITEMS; ++i)
> +		mem_cgroup_stats_index[memcg_node_stat_items[i]] = ++j;
> +
> +	for (i = 0; i < ARRAY_SIZE(memcg_stat_items); ++i)
> +		mem_cgroup_stats_index[memcg_stat_items[i]] = ++j;
> +}
> +
> +static inline int memcg_stats_index(int idx)
> +{
> +	return mem_cgroup_stats_index[idx] - 1;
> +}

Hm, I'm slightly worried about the performance penalty due to the increased cache
footprint. Can't we have some formula to translate idx to memcg_idx instead of
a translation table?
If it requires a re-arrangement of items we can add a translation table on the
read side to save the visible order in procfs/sysfs.
Or I'm overthinking and the real difference is negligible?

Thanks!