[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ef9f7516-853d-ffe4-9a7a-5e87556bdbbe@openvz.org>
Date: Mon, 30 May 2022 16:09:00 +0300
From: Vasily Averin <vvs@...nvz.org>
To: Michal Hocko <mhocko@...e.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, kernel@...nvz.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Shakeel Butt <shakeelb@...gle.com>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Michal Koutný <mkoutny@...e.com>,
Vlastimil Babka <vbabka@...e.cz>,
Muchun Song <songmuchun@...edance.com>, cgroups@...r.kernel.org
Subject: Re: [PATCH mm v3 0/9] memcg: accounting for objects allocated by
mkdir cgroup
On 5/30/22 14:55, Michal Hocko wrote:
> On Mon 30-05-22 14:25:45, Vasily Averin wrote:
>> Below is tracing results of mkdir /sys/fs/cgroup/vvs.test on
>> 4cpu VM with Fedora and self-complied upstream kernel. The calculations
>> are not precise, it depends on kernel config options, number of cpus,
>> enabled controllers, ignores possible page allocations etc.
>> However this is enough to clarify the general situation.
>> All allocations are splited into:
>> - common part, always called for each cgroup type
>> - per-cgroup allocations
>>
>> In each group we consider 2 corner cases:
>> - usual allocations, important for 1-2 CPU nodes/Vms
>> - percpu allocations, important for 'big irons'
>>
>> common part: ~11Kb + 318 bytes percpu
>> memcg: ~17Kb + 4692 bytes percpu
>> cpu: ~2.5Kb + 1036 bytes percpu
>> cpuset: ~3Kb + 12 bytes percpu
>> blkcg: ~3Kb + 12 bytes percpu
>> pid: ~1.5Kb + 12 bytes percpu
>> perf: ~320b + 60 bytes percpu
>> -------------------------------------------
>> total: ~38Kb + 6142 bytes percpu
>> currently accounted: 4668 bytes percpu
>>
>> - it's important to account usual allocations called
>> in common part, because almost all of cgroup-specific allocations
>> are small. One exception here is memory cgroup, it allocates a few
>> huge objects that should be accounted.
>> - Percpu allocation called in common part, in memcg and cpu cgroups
>> should be accounted, rest ones are small an can be ignored.
>> - KERNFS objects are allocated both in common part and in most of
>> cgroups
>>
>> Details can be found here:
>> https://lore.kernel.org/all/d28233ee-bccb-7bc3-c2ec-461fd7f95e6a@openvz.org/
>>
>> I checked other cgroups types was found that they all can be ignored.
>> Additionally I found allocation of struct rt_rq called in cpu cgroup
>> if CONFIG_RT_GROUP_SCHED was enabled, it allocates huge (~1700 bytes)
>> percpu structure and should be accounted too.
>
> One thing that the changelog is missing is an explanation why do we need
> to account those objects. Users are usually not empowered to create
> cgroups arbitrarily. Or at least they shouldn't because we can expect
> more problems to happen.
>
> Could you clarify this please?
The problem is actual for OS-level containers: LXC or OpenVz.
They are widely used for hosting and allow to run containers
by untrusted end-users. Root inside such containers is able
to create groups inside own container and consume host memory
without its proper accounting.
Thank you,
Vasily Averin
Powered by blists - more mailing lists