linux-kernel - Showing /sys/fs/cgroup/memory/memory.stat very slow on some machines

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOm-9arwY3VLUx5189JAR9J7B=Miad9nQjjet_VNdT3i+J+5FA@mail.gmail.com>
Date:   Tue, 3 Jul 2018 08:43:23 +0200
From:   Bruce Merry <bmerry@....ac.za>
To:     linux-kernel@...r.kernel.org
Subject: Showing /sys/fs/cgroup/memory/memory.stat very slow on some machines

Hi

I've run into an odd performance issue in the kernel, and not being a
kernel dev or knowing terribly much about cgroups, am looking for
advice on diagnosing the problem further (I discovered this while
trying to pin down high CPU load in cadvisor).

On some machines in our production system, cat
/sys/fs/cgroup/memory/memory.stat is extremely slow (500ms on one
machine), while on other nominally identical machines it is fast
(2ms).

One other thing I've noticed is that the affected machines generally
have much larger values for SUnreclaim in /proc/memstat (up to several
GB), and slabtop reports >1GB of dentry.

Before I tracked the original problem (high CPU usage in cadvisor)
down to this, I rebooted one of the machines and the original problem
went away, so it seems to be cleared by a reboot; I'm reluctant to
reboot more machines to confirm since I don't have a sure-fire way to
reproduce the problem again to debug it.

The machines are running Ubuntu 16.04 with kernel 4.13.0-41-generic.
They're running Docker, which creates a bunch of cgroups, but not an
excessive number: there are 106 memory.stat files in
/sys/fs/cgroup/memory.

Digging a bit further, cat
/sys/fs/cgroup/memory/system.slice/memory.stat also takes ~500ms, but
"find /sys/fs/cgroup/memory/system.slice -mindepth 2 -name memory.stat
| xargs cat" takes only 8ms.

Any thoughts, particularly on what I should compare between the good
and bad machines to narrow down the cause, or even better, how to
prevent it happening?

Thanks
Bruce
-- 
Bruce Merry
Senior Science Processing Developer
SKA South Africa