linux-kernel - Re: hugetlb pages not accounted for in rss

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.10.1507281622470.10368@chino.kir.corp.google.com>
Date:	Tue, 28 Jul 2015 16:30:19 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Jörn Engel <joern@...estorage.com>
cc:	Mike Kravetz <mike.kravetz@...cle.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: hugetlb pages not accounted for in rss

On Tue, 28 Jul 2015, Jörn Engel wrote:

> What would you propose for me then?  I have 80% RAM or more in reserved
> hugepages.  OOM-killer is not a concern, as it panics the system - the
> alternatives were almost universally silly and we didn't want to deal
> with system in unpredictable states.  But knowing how much memory is
> used by which process is a concern.  And if you only tell me about the
> small (and continuously shrinking) portion, I essentially fly blind.
> 
> That is not a case of "may lead to breakage", it _is_ broken.
> 
> Ideally we would have fixed this in 2002 when hugetlbfs was introduced.
> By now we might have to introduce a new field, rss_including_hugepages
> or whatever.  Then we have to update tools like top etc. to use the new
> field when appropriate.  No fun, but might be necessary.
> 
> If we can get away with including hugepages in rss and fixing the OOM
> killer to be less silly, I would strongly prefer that.  But I don't know
> how much of a mess we are already in.
> 

It's not only the oom killer, I don't believe hugeltb pages are accounted 
to the "rss" in memcg.  They use the hugetlb_cgroup for that.  Starting to 
account for them in existing memcg deployments would cause them to hit 
their memory limits much earlier.  The "rss_huge" field in memcg only 
represents transparent hugepages.

I agree with your comment that having done this when hugetlbfs was 
introduced would have been optimal.

It's always difficult to add a new class of memory to an existing metric 
("new" here because it's currently unaccounted).

If we can add yet another process metric to track hugetlbfs memory mapped, 
then the test could be converted to use that.  I'm not sure if the 
jusitifcation would be strong enough, but you could try.