lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.11.1602181422550.2289@eggly.anvils>
Date:	Thu, 18 Feb 2016 14:57:08 -0800 (PST)
From:	Hugh Dickins <hughd@...gle.com>
To:	Johannes Weiner <hannes@...xchg.org>
cc:	linux-mm@...ck.org, Konstantin Khlebnikov <koct9i@...il.com>,
	Hugh Dickins <hughd@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mgorman@...e.de>,
	linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: [RFC PATCH] proc: do not include shmem and driver pages in
 /proc/meminfo::Cached

On Thu, 18 Feb 2016, Johannes Weiner wrote:

> Even before we added MemAvailable, users knew that page cache is
> easily convertible to free memory on pressure, and estimated their
> "available" memory by looking at the sum of MemFree, Cached, Buffers.
> However, "Cached" is calculated using NR_FILE_PAGES, which includes
> shmem and random driver pages inserted into the page tables; neither
> of which are easily reclaimable, or reclaimable at all. Reclaiming
> shmem requires swapping, which is slow. And unlike page cache, which
> has fairly conservative dirty limits, all of shmem needs to be written
> out before becoming evictable. Without swap, shmem is not evictable at
> all. And driver pages certainly never are.
> 
> Calling these pages "Cached" is misleading and has resulted in broken
> formulas in userspace. They misrepresent the memory situation and
> cause either waste or unexpected OOM kills. With 64-bit and per-cpu
> memory we are way past the point where the relationship between
> virtual and physical memory is meaningful and users can rely on
> overcommit protection. OOM kills can not be avoided without wasting
> enormous amounts of memory this way. This shifts the management burden
> toward userspace, toward applications monitoring their environment and
> adjusting their operations. And so where statistics like /proc/meminfo
> used to be more informational, we have more and more software relying
> on them to make automated decisions based on utilization.
> 
> But if userspace is supposed to take over responsibility, it needs a
> clear and accurate kernel interface to base its judgement on. And one
> of the requirements is certainly that memory consumers with wildly
> different reclaimability are not conflated. Adding MemAvailable is a
> good step in that direction, but there is software like Sigar[1] in
> circulation that might not get updated anytime soon. And even then,
> new users will continue to go for the intuitive interpretation of the
> Cached item. We can't blame them. There are years of tradition behind
> it, starting with the way free(1) and vmstat(8) have always reported
> free, buffers, cached. And try as we might, using "Cached" for
> unevictable memory is never going to be obvious.
> 
> The semantics of Cached including shmem and kernel pages have been
> this way forever, dictated by the single-LRU implementation rather
> than optimal semantics. So it's an uncomfortable proposal to change it
> now. But what other way to fix this for existing users? What other way
> to make the interface more intuitive for future users? And what could
> break by removing it now? I guess somebody who already subtracts Shmem
> from Cached.
> 
> What are your thoughts on this?

My thoughts are NAK.  A misleading stat is not so bad as a
misleading stat whose meaning we change in some random kernel.

By all means improve Documentation/filesystems/proc.txt on Cached.
By all means promote Active(file)+Inactive(file)-Buffers as often a
better measure (though Buffers itself is obscure to me - is it intended
usually to approximate resident FS metadata?).  By all means work on
/proc/meminfo-v2 (though that may entail dispiritingly long discussions).

We have to assume that Cached has been useful to some people, and that
they've learnt to subtract Shmem from it, if slow or no swap concerns them.

Added Konstantin to Cc: he's had valuable experience of people learning
to adapt to the numbers that we put out.

Hugh

> 
> [1] https://github.com/hyperic/sigar/blob/master/src/os/linux/linux_sigar.c#L323
> ---
>  fs/proc/meminfo.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> index df4661abadc4..e19126be1dca 100644
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -43,14 +43,14 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>  	si_swapinfo(&i);
>  	committed = percpu_counter_read_positive(&vm_committed_as);
>  
> -	cached = global_page_state(NR_FILE_PAGES) -
> -			total_swapcache_pages() - i.bufferram;
> -	if (cached < 0)
> -		cached = 0;
> -
>  	for (lru = LRU_BASE; lru < NR_LRU_LISTS; lru++)
>  		pages[lru] = global_page_state(NR_LRU_BASE + lru);
>  
> +	cached = pages[LRU_ACTIVE_FILE] + pages[LRU_INACTIVE_FILE];
> +	cached -= i.bufferram;
> +	if (cached < 0)
> +		cached = 0;
> +
>  	for_each_zone(zone)
>  		wmark_low += zone->watermark[WMARK_LOW];
>  
> -- 
> 2.7.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ