[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.10.1507281622470.10368@chino.kir.corp.google.com>
Date: Tue, 28 Jul 2015 16:30:19 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: Jörn Engel <joern@...estorage.com>
cc: Mike Kravetz <mike.kravetz@...cle.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: hugetlb pages not accounted for in rss
On Tue, 28 Jul 2015, Jörn Engel wrote:
> What would you propose for me then? I have 80% RAM or more in reserved
> hugepages. OOM-killer is not a concern, as it panics the system - the
> alternatives were almost universally silly and we didn't want to deal
> with system in unpredictable states. But knowing how much memory is
> used by which process is a concern. And if you only tell me about the
> small (and continuously shrinking) portion, I essentially fly blind.
>
> That is not a case of "may lead to breakage", it _is_ broken.
>
> Ideally we would have fixed this in 2002 when hugetlbfs was introduced.
> By now we might have to introduce a new field, rss_including_hugepages
> or whatever. Then we have to update tools like top etc. to use the new
> field when appropriate. No fun, but might be necessary.
>
> If we can get away with including hugepages in rss and fixing the OOM
> killer to be less silly, I would strongly prefer that. But I don't know
> how much of a mess we are already in.
>
It's not only the oom killer, I don't believe hugeltb pages are accounted
to the "rss" in memcg. They use the hugetlb_cgroup for that. Starting to
account for them in existing memcg deployments would cause them to hit
their memory limits much earlier. The "rss_huge" field in memcg only
represents transparent hugepages.
I agree with your comment that having done this when hugetlbfs was
introduced would have been optimal.
It's always difficult to add a new class of memory to an existing metric
("new" here because it's currently unaccounted).
If we can add yet another process metric to track hugetlbfs memory mapped,
then the test could be converted to use that. I'm not sure if the
jusitifcation would be strong enough, but you could try.
Powered by blists - more mailing lists