[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkZXVFS8XUi07vHAa5WTWR3myi=sSmrLrVpH0+AKCjDkbw@mail.gmail.com>
Date: Fri, 30 Aug 2024 00:13:47 -0700
From: Yosry Ahmed <yosryahmed@...gle.com>
To: liujing <liujing@...s.chinamobile.com>
Cc: akpm <akpm@...ux-foundation.org>, linux-mm <linux-mm@...ck.org>,
linux-kernel <linux-kernel@...r.kernel.org>, Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>, Roman Gushchin <roman.gushchin@...ux.dev>,
Shakeel Butt <shakeel.butt@...ux.dev>, Muchun Song <muchun.song@...ux.dev>
Subject: Re: The percpu memory used by memcg cannot be cleared
On Thu, Aug 29, 2024 at 8:22 PM liujing <liujing@...s.chinamobile.com> wrote:
>
> hello,linux boss
>
> I found a problem in the process of using linux memcg,When I turned swap off, the memcg memory I created with the following script could not be deleted with echo 0 > memory.force_empty, as explained below。
(Adding memcg maintainers in case they are interested)
It's not a problem, it's the way the linux kernel currently behaves in
terms of handling deleted memcgs that are still referenced in the
kernel (i.e. offline/dying/zombie memcgs).
>
> ----------------------------------------------------------------------------------------------------------
> step1:swapoff -a
>
>
> step2:use this script to create memcg
>
> #!/bin/bash
> mkdir -p /tmp/test
> for i in 'seq 2000'
> do
> sudo mkdir -p /sys/fs/cgroup/memory/user.slice/user-0.slice/test$ {i}
> sudo echo $$ > /sys/fs/cgroup/memory/user.slice/user-0.slice/test$ {i}/tasks
> sudo echo 'data' > /tmp/test/test$ {i}
Assuming /tmp is a tmpfs mount, here you created 2000 child memcgs and
allocated one tmpfs page in each of them. So each of those child
memcgs is charged for one page of memory, and each charge holds a
reference to the the respective memcg.
> sudo echo $$ > /sys/fs/cgroup/memory/user.slice/user-0.slice/tasks
> sudo rmdir /sys/fs/cgroup/memory/user.slice/user-0.slice/test$ {i}
Then you deleted those memcgs, but the kernel cannot free them yet
because the tmpfs memory you allocated above is still charged to them.
> done
>
>
> step3:view /proc/cgroup and /proc/meminfo files
>
> [root@...alhost ~]# cat /proc/cgroups
> #subsys_name hierarchy num_cgroups enabled
> cpuset 10 1 1
> cpu 4 1 1
> cpuacct 4 1 1
> blkio 13 1 1
> memory 14 2009 1
Here you can see the cgroups you deleted still exist in the kernel.
> devices 6 94 1
>
> [root@...alhost ~]# cat /proc/meminfo | grep Percpu
> Percpu: 600576 kB
The percpu memory you observe here is likely the per-CPU metadata that
the kernel uses to keep track of each memcg. Since the memcgs are not
freed, the metadata is not freed either.
>
>
> step4:when I use "echo 0 > /sys/fs/cgroup/memory/user.slice/user-0.slice/memory.force_empty", I find the num_cgroups of memory and percpu have no changed
Yes, because at this point there is no swap, so the tmpfs memory
charged to the deleted memcg cannot be reclaimed and cannot be freed,
and the refs they hold cannot be dropped.
>
> [root@...alhost ~]# echo 0 > /sys/fs/cgroup/memory/user.slice/user-0.slice/memory.force_empty
> [root@...alhost ~]# cat /proc/cgroups
> #subsys_name hierarchy num_cgroups enabled
> cpuset 10 1 1
> cpu 4 1 1
> cpuacct 4 1 1
> blkio 13 1 1
> memory 14 2039 1
> devices 6 87 1
>
> [root@...alhost ~]# cat /proc/meminfo | grep Percpu
> Percpu: 600576 kB
>
>
> step 5: when I use swapon -a to open swap, then echo 0 > /sys/fs/cgroup/memory/user.slice/user-0.slice/memory.force_empty again
>
> [root@...alhost ~]# swapon -a
> [root@...alhost ~]# echo 0 > /sys/fs/cgroup/memory/user.slice/user-0.slice/memory.force_empty
When you add a swapfile and try to reclaim memory from the cgroups
again, the kernel is able to reclaim the tmpfs memory by swapping it
out. The kernel is smart enough at this point to not charge the swap
slots to the deleted cgroups, but to their living/online parent. At
this point, the tmpfs memory is uncharged and freed, and the refs to
the deleted cgroups are dropped. Now they can be deleted by the
kernel.
>
>
> step 6: view /proc/cgroup and /proc/meminfo files ,I found the the num_cgroups of memory and percpu have been reduced.
> [root@...alhost ~]# cat /proc/cgroups
> #subsys_name hierarchy num_cgroups enabled
> cpuset 10 1 1
> cpu 4 1 1
> cpuacct 4 1 1
> blkio 13 1 1
> memory 14 185 1
> devices 6 87 1
> freezer 9 1 1
>
> [root@...alhost ~]# cat /proc/meminfo | grep Percpu
> Percpu: 120832 kB
Now the memcgs are freed, and their associated per-CPU metadata is also freed.
> --------------------------------------------------------------------------------------------------------
>
>
> Therefore, I want to know why swap affects memcg memory reclamation, echo 0 > memory.force_empty this interface should force the memory used by the cgroup to be reclaimed.
> I want to know why ,I look forward to hearing back from the community.
I hope it's now clear that the per-CPU memory cannot be freed when you
use memory.force_empty on the parent memcg, because the per-CPU memory
is the metadata of the deleted memcgs, and those cannot be freed until
the memory charged to them is freed (which needs swap, because it's
tmpfs not a regular file).
Powered by blists - more mailing lists