[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zi_EEOUS_iCh2Nfh@P9FQF9L96D>
Date: Mon, 29 Apr 2024 09:00:16 -0700
From: Roman Gushchin <roman.gushchin@...ux.dev>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Muchun Song <muchun.song@...ux.dev>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 3/7] memcg: reduce memory for the lruvec and memcg
stats
On Fri, Apr 26, 2024 at 05:37:29PM -0700, Shakeel Butt wrote:
> At the moment, the amount of memory allocated for stats related structs
> in the mem_cgroup corresponds to the size of enum node_stat_item.
> However not all fields in enum node_stat_item has corresponding memcg
> stats. So, let's use indirection mechanism similar to the one used for
> memcg vmstats management.
>
> For a given x86_64 config, the size of stats with and without patch is:
>
> structs size in bytes w/o with
>
> struct lruvec_stats 1128 648
> struct lruvec_stats_percpu 752 432
> struct memcg_vmstats 1832 1352
> struct memcg_vmstats_percpu 1280 960
>
> The memory savings is further compounded by the fact that these structs
> are allocated for each cpu and for each node. To be precise, for each
> memcg the memory saved would be:
>
> Memory saved = ((21 * 3 * NR_NODES) + (21 * 2 * NR_NODS * NR_CPUS) +
> (21 * 3) + (21 * 2 * NR_CPUS)) * sizeof(long)
>
> Where 21 is the number of fields eliminated.
Nice savings!
>
> Signed-off-by: Shakeel Butt <shakeel.butt@...ux.dev>
> ---
> mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 115 insertions(+), 23 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5e337ed6c6bf..c164bc9b8ed6 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -576,35 +576,105 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz)
> return mz;
> }
>
> +/* Subset of node_stat_item for memcg stats */
> +static const unsigned int memcg_node_stat_items[] = {
> + NR_INACTIVE_ANON,
> + NR_ACTIVE_ANON,
> + NR_INACTIVE_FILE,
> + NR_ACTIVE_FILE,
> + NR_UNEVICTABLE,
> + NR_SLAB_RECLAIMABLE_B,
> + NR_SLAB_UNRECLAIMABLE_B,
> + WORKINGSET_REFAULT_ANON,
> + WORKINGSET_REFAULT_FILE,
> + WORKINGSET_ACTIVATE_ANON,
> + WORKINGSET_ACTIVATE_FILE,
> + WORKINGSET_RESTORE_ANON,
> + WORKINGSET_RESTORE_FILE,
> + WORKINGSET_NODERECLAIM,
> + NR_ANON_MAPPED,
> + NR_FILE_MAPPED,
> + NR_FILE_PAGES,
> + NR_FILE_DIRTY,
> + NR_WRITEBACK,
> + NR_SHMEM,
> + NR_SHMEM_THPS,
> + NR_FILE_THPS,
> + NR_ANON_THPS,
> + NR_KERNEL_STACK_KB,
> + NR_PAGETABLE,
> + NR_SECONDARY_PAGETABLE,
> +#ifdef CONFIG_SWAP
> + NR_SWAPCACHE,
> +#endif
> +};
> +
> +static const unsigned int memcg_stat_items[] = {
> + MEMCG_SWAP,
> + MEMCG_SOCK,
> + MEMCG_PERCPU_B,
> + MEMCG_VMALLOC,
> + MEMCG_KMEM,
> + MEMCG_ZSWAP_B,
> + MEMCG_ZSWAPPED,
> +};
> +
> +#define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items)
> +#define NR_MEMCG_STATS (NR_MEMCG_NODE_STAT_ITEMS + ARRAY_SIZE(memcg_stat_items))
> +static int8_t mem_cgroup_stats_index[MEMCG_NR_STAT] __read_mostly;
> +
> +static void init_memcg_stats(void)
> +{
> + int8_t i, j = 0;
> +
> + /* Switch to short once this failure occurs. */
> + BUILD_BUG_ON(NR_MEMCG_STATS >= 127 /* INT8_MAX */);
> +
> + for (i = 0; i < NR_MEMCG_NODE_STAT_ITEMS; ++i)
> + mem_cgroup_stats_index[memcg_node_stat_items[i]] = ++j;
> +
> + for (i = 0; i < ARRAY_SIZE(memcg_stat_items); ++i)
> + mem_cgroup_stats_index[memcg_stat_items[i]] = ++j;
> +}
> +
> +static inline int memcg_stats_index(int idx)
> +{
> + return mem_cgroup_stats_index[idx] - 1;
> +}
Hm, I'm slightly worried about the performance penalty due to the increased cache
footprint. Can't we have some formula to translate idx to memcg_idx instead of
a translation table?
If it requires a re-arrangement of items we can add a translation table on the
read side to save the visible order in procfs/sysfs.
Or I'm overthinking and the real difference is negligible?
Thanks!
Powered by blists - more mailing lists