linux-kernel - Re: [PATCH RFC 1/3] cgroup: list all subsystem states in debugfs files

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1cc17b7b-8f5b-6682-ccc0-19ff5f9992e0@yandex-team.ru>
Date:   Tue, 14 Aug 2018 12:40:58 +0300
From:   Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To:     Roman Gushchin <guro@...com>, Johannes Weiner <hannes@...xchg.org>
Cc:     Tejun Heo <tj@...nel.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
        Michal Hocko <mhocko@...nel.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>
Subject: Re: [PATCH RFC 1/3] cgroup: list all subsystem states in debugfs
 files

On 13.08.2018 20:53, Roman Gushchin wrote:
> On Mon, Aug 13, 2018 at 01:11:19PM -0400, Johannes Weiner wrote:
>> On Mon, Aug 13, 2018 at 06:48:42AM -0700, Tejun Heo wrote:
>>> Hello, Konstantin.
>>>
>>> On Mon, Aug 13, 2018 at 09:58:05AM +0300, Konstantin Khlebnikov wrote:
>>>> After removing cgroup subsystem state could leak or live in background
>>>> forever because it is pinned by some reference. For example memory cgroup
>>>> could be pinned by pages in cache or tmpfs.
>>>>
>>>> This patch adds common debugfs interface for listing basic state for each
>>>> controller. Controller could define callback for dumping own attributes.
>>>>
>>>> In file /sys/kernel/debug/cgroup/<controller> each line shows state in
>>>> format: <common_attr>=<value>... [-- <controller_attr>=<value>... ]
>>>
>>> Seems pretty useful to me.  Roman, Johannes, what do you guys think?
> 
> Totally agree with the idea and was about to suggest something similar.
> 
>> Generally I like the idea of having more introspection into offlined
>> cgroups, but I wonder if having only memory= and swap= could be a
>> little too terse to track down what exactly is pinning the groups.
>>
>> Roman has more experience debugging these pileups, but it seems to me
>> that unless we add a breakdown off memory, and maybe make slabinfo
>> available for these groups, that in practice this might not provide
>> that much more insight than per-cgroup stat counters of dead children.
> 
> I agree here.

I don't think that we could cover all cases with single interface.

This debugfs just gives simple entry point for debugging:
- paths for guessing user
- pointers for looking with gdb via kcore
- inode numbers for page-types - see second and third patch

For slab: this could show one of remaining slab. Anyway each of them pins css.

> 
> It's hard to say in advance what numbers are useful, so let's export
> these numbers, but also make the format more extendable, so we can
> easily add new information later. Maybe, something like:
> 
> cgroup {
>    path = ...
>    ino = ...
>    main css {
>      refcnt = ...
>      key = value
>      ...
>    }
>    memcg css {
>      refcnt = ...
>      ...
>    }
>    some other controller css {
>    }
>    ...
> }
> 
> Also, because we do batch charges, printing numbers without draining stocks
> is not that useful. All stats are also per-cpu cached, what adds some
> inaccuracy.

Seems too verbose. And one-line key=value format is more grep/awk friendly.

Anyway such extended debugging could be implemented as gdb plugin.
While simple list of pointers gives enough information for dumping structures
with gdb alone without extra plugins.

> 
> Thanks!
>