linux-kernel - [RFC PATCH] proc: do not include shmem and driver pages in /proc/meminfo::Cached

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <1455827801-13082-1-git-send-email-hannes@cmpxchg.org>
Date:	Thu, 18 Feb 2016 15:36:41 -0500
From:	Johannes Weiner <hannes@...xchg.org>
To:	linux-mm@...ck.org
Cc:	Hugh Dickins <hughd@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
	linux-kernel@...r.kernel.org, kernel-team@...com
Subject: [RFC PATCH] proc: do not include shmem and driver pages in /proc/meminfo::Cached

Even before we added MemAvailable, users knew that page cache is
easily convertible to free memory on pressure, and estimated their
"available" memory by looking at the sum of MemFree, Cached, Buffers.
However, "Cached" is calculated using NR_FILE_PAGES, which includes
shmem and random driver pages inserted into the page tables; neither
of which are easily reclaimable, or reclaimable at all. Reclaiming
shmem requires swapping, which is slow. And unlike page cache, which
has fairly conservative dirty limits, all of shmem needs to be written
out before becoming evictable. Without swap, shmem is not evictable at
all. And driver pages certainly never are.

Calling these pages "Cached" is misleading and has resulted in broken
formulas in userspace. They misrepresent the memory situation and
cause either waste or unexpected OOM kills. With 64-bit and per-cpu
memory we are way past the point where the relationship between
virtual and physical memory is meaningful and users can rely on
overcommit protection. OOM kills can not be avoided without wasting
enormous amounts of memory this way. This shifts the management burden
toward userspace, toward applications monitoring their environment and
adjusting their operations. And so where statistics like /proc/meminfo
used to be more informational, we have more and more software relying
on them to make automated decisions based on utilization.

But if userspace is supposed to take over responsibility, it needs a
clear and accurate kernel interface to base its judgement on. And one
of the requirements is certainly that memory consumers with wildly
different reclaimability are not conflated. Adding MemAvailable is a
good step in that direction, but there is software like Sigar[1] in
circulation that might not get updated anytime soon. And even then,
new users will continue to go for the intuitive interpretation of the
Cached item. We can't blame them. There are years of tradition behind
it, starting with the way free(1) and vmstat(8) have always reported
free, buffers, cached. And try as we might, using "Cached" for
unevictable memory is never going to be obvious.

The semantics of Cached including shmem and kernel pages have been
this way forever, dictated by the single-LRU implementation rather
than optimal semantics. So it's an uncomfortable proposal to change it
now. But what other way to fix this for existing users? What other way
to make the interface more intuitive for future users? And what could
break by removing it now? I guess somebody who already subtracts Shmem
from Cached.

What are your thoughts on this?

[1] https://github.com/hyperic/sigar/blob/master/src/os/linux/linux_sigar.c#L323
---
 fs/proc/meminfo.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index df4661abadc4..e19126be1dca 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -43,14 +43,14 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 	si_swapinfo(&i);
 	committed = percpu_counter_read_positive(&vm_committed_as);

-	cached = global_page_state(NR_FILE_PAGES) -
-			total_swapcache_pages() - i.bufferram;
-	if (cached < 0)
-		cached = 0;
-
 	for (lru = LRU_BASE; lru < NR_LRU_LISTS; lru++)
 		pages[lru] = global_page_state(NR_LRU_BASE + lru);

+	cached = pages[LRU_ACTIVE_FILE] + pages[LRU_INACTIVE_FILE];
+	cached -= i.bufferram;
+	if (cached < 0)
+		cached = 0;
+
 	for_each_zone(zone)
 		wmark_low += zone->watermark[WMARK_LOW];

-- 
2.7.1