linux-kernel - Re: Showing /sys/fs/cgroup/memory/memory.stat very slow on some machines

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 24 Jul 2018 12:05:35 +0200
From:   Bruce Merry <bmerry@....ac.za>
To:     Shakeel Butt <shakeelb@...gle.com>
Cc:     Michal Hocko <mhocko@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux MM <linux-mm@...ck.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>
Subject: Re: Showing /sys/fs/cgroup/memory/memory.stat very slow on some machines

On 18 July 2018 at 19:40, Bruce Merry <bmerry@....ac.za> wrote:
>> Yes, very easy to produce zombies, though I don't think kernel
>> provides any way to tell how many zombies exist on the system.
>>
>> To create a zombie, first create a memcg node, enter that memcg,
>> create a tmpfs file of few KiBs, exit the memcg and rmdir the memcg.
>> That memcg will be a zombie until you delete that tmpfs file.
>
> Thanks, that makes sense. I'll see if I can reproduce the issue.

Hi

I've had some time to experiment with this issue, and I've now got a
way to reproduce it fairly reliably, including with a stock 4.17.8
kernel. However, it's very phase-of-the-moon stuff, and even
apparently trivial changes (like switching the order in which the
files are statted) makes the issue disappear.

To reproduce:
1. Start cadvisor running. I use the 0.30.2 binary from Github, and
run it with sudo ./cadvisor-0.30.2 --logtostderr=true
2. Run the Python 3 script below, which repeatedly creates a cgroup,
enters it, stats some files in it, and leaves it again (and removes
it). It takes a few minutes to run.
3. time cat /sys/fs/cgroup/memory/memory.stat. It now takes about 20ms for me.
4. sudo sysctl vm.drop_caches=2
5. time cat /sys/fs/cgroup/memory/memory.stat. It is back to 1-2ms.

I've also added some code to memcg_stat_show to report the number of
cgroups in the hierarchy (iterations in for_each_mem_cgroup_tree).
Running the script increases it from ~700 to ~41000. The script
iterates 250,000 times, so only some fraction of the cgroups become
zombies.

I also tried the suggestion of force_empty: it makes the problem go
away, but is also very, very slow (about 0.5s per iteration), and
given the sensitivity of the test to small changes I don't know how
meaningful that is.

Reproduction code (if you have tqdm installed you get a nice progress
bar, but not required). Hopefully Gmail doesn't do any format
mangling:

#!/usr/bin/env python3
import os

try:
    from tqdm import trange as range
except ImportError:
    pass

def clean():
    try:
        os.rmdir(name)
    except FileNotFoundError:
        pass

def move_to(cgroup):
    with open(cgroup + '/tasks', 'w') as f:
        print(pid, file=f)

pid = os.getpid()
os.chdir('/sys/fs/cgroup/memory')
name = 'dummy'
N = 250000
clean()
try:
    for i in range(N):
        os.mkdir(name)
        move_to(name)
        for filename in ['memory.stat', 'memory.swappiness']:
            os.stat(os.path.join(name, filename))
        move_to('user.slice')
        os.rmdir(name)
finally:
    move_to('user.slice')
    clean()

Regards
Bruce
-- 
Bruce Merry
Senior Science Processing Developer
SKA South Africa