linux-kernel - Re: [PATCH] kernel: fs: drop_caches: add dds drop_caches

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <D2E7B337.D5404%nag@cisco.com>
Date:	Tue, 16 Feb 2016 02:58:04 +0000
From:	"Nag Avadhanam (nag)" <nag@...co.com>
To:	"Theodore Ts'o" <tytso@....edu>,
	"Daniel Walker (danielwa)" <danielwa@...co.com>
CC:	Dave Chinner <david@...morbit.com>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	"Khalid Mughal (khalidm)" <khalidm@...co.com>,
	"xe-kernel@...ernal.cisco.com" <xe-kernel@...ernal.cisco.com>,
	"dave.hansen@...el.com" <dave.hansen@...el.com>,
	"hannes@...xchg.org" <hannes@...xchg.org>,
	"riel@...hat.com" <riel@...hat.com>,
	Jonathan Corbet <corbet@....net>,
	"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH] kernel: fs: drop_caches: add dds drop_caches_count

We have a class of platforms that are essentially swap-less embedded
systems that have limited memory resources (2GB and less).

There is a need to implement early alerts (before the OOM killer kicks in)
based on the current memory usage so admins can take appropriate steps (do
not initiate provisioning operations but support existing services,
de-provision certain services, etc. based on the extent of memory usage in
the system) . 

There is also a general need to let end users know the available memory so
they can determine if they can enable new services (helps in planning).

These two depend upon knowing approximate (accurate within few 10s of MB)
memory usage within the system. We want to alert admins before system
exhibits any thrashing behaviors.

We find the source of accounting anomalies to be the page cache
accounting. Anonymous page accounting is fine. Page cache usage on our
system can be attributed to these  file system cache, shared memory store
(non-reclaimable) and the in-memory file systems (non-reclaimable). We
know the sizes of the shared memory stores and the in memory file system
sizes.

If we can determine the amount of reclaimable file system cache (+/- few
10s of MB), we can improve the serviceability of these systems.

Total - (# of bytes of anon pages + # of bytes of shared memory/tmpfs
pages + # of bytes of non-reclaimable file system cache pages) gives us a
measure of the available memory.

Its the calculation of the # of bytes of non-reclaimable file system cache
pages that has been troubling us. We do not want to count inactive file
pages (of programs/binaries) that were once mapped by any process in the
system as reclaimable because that might lead to thrashing under memory
pressure (we want to alert admins before system starts dropping text
pages).

>From our experiments, we determined running a VM scan looking for
droppable pages came close to establishing that number. If there are
cheaper ways of determining this stat, please let us know.

Thanks,
nag 

On 2/15/16, 4:45 PM, "Theodore Ts'o" <tytso@....edu> wrote:

>On Mon, Feb 15, 2016 at 03:52:31PM -0800, Daniel Walker wrote:
>> >>We need it to determine accurately what the free memory in the
>> >>system is. If you know where we can get this information already
>> >>please tell, we aren't aware of it. For instance /proc/meminfo isn't
>> >>accurate enough.
>> 
>> Approximate point-in-time indication is an accurate characterization
>> of what we are doing. This is good enough for us. NO matter what we
>> do, we are never going to be able to address the "time of check to
>> time of use² window.  But, this approximation works reasonably well
>> for our use case.
>
>Why do you need such accuracy, and what do you consider "good enough".
>Having something which iterates over all of the inodes in the system
>is something that really shouldn't be in a general production kernel
>At the very least it should only be accessible by root (so now only a
>careless system administrator can DOS attack the system) but the
>Dave's original question still stands.  Why do you need a certain
>level of accuracy regarding how much memory is available after
>dropping all of the caches?  What problem are you trying to
>solve/avoid?
>
>It may be that you are going about things completely the wrong way,
>which is why understanding the higher order problem you are trying to
>solve might be helpful in finding something which is safer,
>architecturally cleaner, and something that could go into the upstream
>kernel.
>
>Cheers,
>
>						- Ted