lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkZXVFS8XUi07vHAa5WTWR3myi=sSmrLrVpH0+AKCjDkbw@mail.gmail.com>
Date: Fri, 30 Aug 2024 00:13:47 -0700
From: Yosry Ahmed <yosryahmed@...gle.com>
To: liujing <liujing@...s.chinamobile.com>
Cc: akpm <akpm@...ux-foundation.org>, linux-mm <linux-mm@...ck.org>, 
	linux-kernel <linux-kernel@...r.kernel.org>, Johannes Weiner <hannes@...xchg.org>, 
	Michal Hocko <mhocko@...nel.org>, Roman Gushchin <roman.gushchin@...ux.dev>, 
	Shakeel Butt <shakeel.butt@...ux.dev>, Muchun Song <muchun.song@...ux.dev>
Subject: Re: The percpu memory used by memcg cannot be cleared

On Thu, Aug 29, 2024 at 8:22 PM liujing <liujing@...s.chinamobile.com> wrote:
>
> hello,linux boss
>
>         I found a problem in the process of using linux memcg,When I turned swap off, the memcg memory I created with the following script could not be deleted with echo 0 > memory.force_empty, as explained below。

(Adding memcg maintainers in case they are interested)

It's not a problem, it's the way the linux kernel currently behaves in
terms of handling deleted memcgs that are still referenced in the
kernel (i.e. offline/dying/zombie memcgs).

>
> ----------------------------------------------------------------------------------------------------------
> step1:swapoff -a
>
>
> step2:use this script to create memcg
>
> #!/bin/bash
> mkdir -p /tmp/test
> for i in 'seq 2000'
> do
>         sudo mkdir -p /sys/fs/cgroup/memory/user.slice/user-0.slice/test$ {i}
>         sudo echo $$ > /sys/fs/cgroup/memory/user.slice/user-0.slice/test$ {i}/tasks
>         sudo echo 'data' > /tmp/test/test$ {i}

Assuming /tmp is a tmpfs mount, here you created 2000 child memcgs and
allocated one tmpfs page in each of them. So each of those child
memcgs is charged for one page of memory, and each charge holds a
reference to the the respective memcg.

>         sudo echo $$ > /sys/fs/cgroup/memory/user.slice/user-0.slice/tasks
>         sudo rmdir /sys/fs/cgroup/memory/user.slice/user-0.slice/test$ {i}

Then you deleted those memcgs, but the kernel cannot free them yet
because the tmpfs memory you allocated above is still charged to them.

> done

>
>
> step3:view /proc/cgroup and /proc/meminfo  files
>
> [root@...alhost ~]# cat /proc/cgroups
> #subsys_name    hierarchy       num_cgroups     enabled
> cpuset                    10                   1                         1
> cpu                          4                     1                        1
> cpuacct                    4                     1                       1
> blkio                        13                   1                        1
> memory                  14                 2009                   1

Here you can see the cgroups you deleted still exist in the kernel.

> devices                      6                   94                       1
>
> [root@...alhost ~]# cat /proc/meminfo | grep Percpu
> Percpu:           600576 kB

The percpu memory you observe here is likely the per-CPU metadata that
the kernel uses to keep track of each memcg. Since the memcgs are not
freed, the metadata is not freed either.

>
>
> step4:when I use "echo 0 > /sys/fs/cgroup/memory/user.slice/user-0.slice/memory.force_empty", I find the num_cgroups of memory and percpu  have no changed

Yes, because at this point there is no swap, so the tmpfs memory
charged to the deleted memcg cannot be reclaimed and cannot be freed,
and the refs they hold cannot be dropped.

>
> [root@...alhost ~]# echo 0 > /sys/fs/cgroup/memory/user.slice/user-0.slice/memory.force_empty
> [root@...alhost ~]# cat /proc/cgroups
> #subsys_name    hierarchy       num_cgroups     enabled
> cpuset                    10      1       1
> cpu                          4       1       1
> cpuacct                   4       1       1
> blkio                       13      1       1
> memory                  14      2039    1
> devices                    6       87      1
>
> [root@...alhost ~]# cat /proc/meminfo | grep Percpu
> Percpu:           600576 kB
>
>
> step 5: when I use swapon -a to open swap, then echo 0 > /sys/fs/cgroup/memory/user.slice/user-0.slice/memory.force_empty again
>
> [root@...alhost ~]# swapon -a
> [root@...alhost ~]# echo 0 > /sys/fs/cgroup/memory/user.slice/user-0.slice/memory.force_empty

When you add a swapfile and try to reclaim memory from the cgroups
again, the kernel is able to reclaim the tmpfs memory by swapping it
out. The kernel is smart enough at this point to not charge the swap
slots to the deleted cgroups, but to their living/online parent. At
this point, the tmpfs memory is uncharged and freed, and the refs to
the deleted cgroups are dropped. Now they can be deleted by the
kernel.

>
>
> step 6: view /proc/cgroup and /proc/meminfo  files ,I found the the num_cgroups of memory and percpu  have been reduced.
> [root@...alhost ~]# cat /proc/cgroups
> #subsys_name    hierarchy       num_cgroups     enabled
> cpuset                    10                         1                 1
> cpu                         4                          1                 1
> cpuacct                  4                          1                  1
> blkio                     13                         1                  1
> memory                14                     185                 1
> devices                    6                      87                 1
> freezer                   9                          1                 1
>
> [root@...alhost ~]# cat /proc/meminfo | grep Percpu
> Percpu:           120832 kB

Now the memcgs are freed, and their associated per-CPU metadata is also freed.


> --------------------------------------------------------------------------------------------------------
>
>
> Therefore, I want to know why swap affects memcg memory reclamation,  echo 0 > memory.force_empty this interface should force the memory used by the cgroup to be reclaimed.
> I want to know why ,I look forward to hearing back from the community.

I hope it's now clear that the per-CPU memory cannot be freed when you
use memory.force_empty on the parent memcg, because the per-CPU memory
is the metadata of the deleted memcgs, and those cannot be freed until
the memory charged to them is freed (which needs swap, because it's
tmpfs not a regular file).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ