lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 15 Nov 2017 11:18:13 +0000
From:   Roman Gushchin <guro@...com>
To:     Michal Hocko <mhocko@...nel.org>
CC:     <linux-mm@...ck.org>, Johannes Weiner <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Tejun Heo <tj@...nel.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Dave Hansen <dave.hansen@...el.com>, <kernel-team@...com>,
        <cgroups@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] memcg: hugetlbfs basic usage accounting

On Wed, Nov 15, 2017 at 09:35:04AM +0100, Michal Hocko wrote:
> On Tue 14-11-17 17:24:29, Roman Gushchin wrote:
> > This patch implements basic accounting of memory consumption
> > by hugetlbfs pages for cgroup v2 memory controller.
> > 
> > Cgroup v2 memory controller lacks any visibility into the
> > hugetlbfs memory consumption. Cgroup v1 implemented a separate
> > hugetlbfs controller, which provided such stats, and also
> > provided some control abilities. Although porting of the
> > hugetlbfs controller to cgroup v2 is arguable a good idea and
> > is outside of scope of this patch, it's very useful to have
> > basic stats provided by memory.stat.

Hi, Michal!

> Separate hugetlb cgroup controller was really a deliberate decision.
> We didn't want to mix hugetlb with the reclaimable memory. There is no
> reasonable way to enforce memcg limits if hugetlb pages are involved.
> 
> AFAICS your patch shouldn't break the hugetlb controller because that
> one (ab)uses page[2].private to store the hstate for the accounting.
> You also do not really charge those hugetlb pages so the memcg
> accounting will work unchaged.

Yes, you are right.

> 
> So my primary question is, why don't you simply allow hugetlb controller
> rather than tweak stats for memcg? Is there any fundamental reason why
> hugetlb controller is not v2 compatible?

I really don't know if the hugetlb controller has enough users to deserve
full support in v2 interface: adding knobs like memory.hugetlb.current,
memory.hugetlb.min, memory.hugetlb.high, memory.hugetlb.max, etc.

I'd be rather conservative here and avoid adding a lot to the interface
without clear demand. Also, hugetlb pages are really special, and it's
at least not obvious how, say, memory.high should work for it.

At the same time we don't really have any accounting of hugetlb page
usage (except system-wide stats in sysfs). And providing such stats
is really useful.
In my particular case, I have some number of pre-allocated hugepages,
and I have several containerized workloads, which are potentially
using them to get performance bonuses. Having these stats allows to
attribute the memory holding by hugetlb pages to one of the workloads.

> It feels really strange to keeps stats of something the controller
> doesn't really control. I can imagine confused users claiming that
> numbers just do not add up...

This is why I do not add this number to memory.current. At the same
time numbers in memory.stat are not intended to be summed (we have
event counters there, dirty pages counter, etc), so I don't see a problem.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ