lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1290529171.2390.7994.camel@nimitz>
Date:	Tue, 23 Nov 2010 08:19:31 -0800
From:	Dave Hansen <dave@...ux.vnet.ibm.com>
To:	Peter Schüller <scode@...tify.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org,
	Mattias de Zalenski <zalenski@...tify.com>,
	linux-mm@...ck.org
Subject: Re: Sudden and massive page cache eviction

On Tue, 2010-11-23 at 10:44 +0100, Peter Schüller wrote:
> > You don't have anybody messing with /proc/sys/vm/drop_caches, do you?
> 
> Highly unlikely given that (1) evictions, while often very
> significant, are usually not *complete* (although the first graph
> example I provided had a more or less complete eviction) and (2) the
> evictions are not obviously periodic indicating some kind of cron job,
> and (3) we see the evictions happening across a wide variety of
> machines.
> 
> So yes, I feel confident that we are not accidentally doing that.

Yeah, drop_caches doesn't seem very likely.

Your postgres data looks the cleanest and is probably the easiest to
analyze.  Might as well start there:

	http://files.spotify.com/memcut/postgresql_weekly.png

As you said, it might not be the same as the others, but it's a decent
place to start.  If someone used drop_caches or if someone was randomly
truncating files, we'd expect to see the active/inactive lines both drop
by relatively equivalent amounts, and see them happen at _exactly_ the
same time as the cache eviction.  The eviction about 1/3 of the way
through Wednesday in the above graph kinda looks this way, but it's the
exception.

Just eyeballing it, _most_ of the evictions seem to happen after some
movement in the active/inactive lists.  We see an "inactive" uptick as
we start to launder pages, and the page activation doesn't keep up with
it.  This is a _bit_ weird since we don't see any slab cache or other
users coming to fill the new space.  Something _wanted_ the memory, so
why isn't it being used?

Do you have any large page (hugetlbfs) or other multi-order (> 1 page)
allocations happening in the kernel?  

If you could start recording /proc/{vmstat,buddystat,meminfo,slabinfo},
it would be immensely useful.  The munin graphs are really great, but
they don't have the detail which you can get from stuff like vmstat.

> Further, we have observed the kernel's unwillingness to retain data in
> page cache under interesting circumstances:
> 
> (1) page cache eviction happens
> (2) we warm up our BDB files by cat:ing them (simple but effective)
> (3) within a matter of minutes, while there is still several GB of
> free (truly free, not page cached), these are evicted (as evidenced by
> re-cat:ing them a little while later)
>
> This latest observation we understand may be due to NUMA related
> allocation issues, and we should probably try to use numactl to ask
> for a more even allocation. We have not yet tried this. However, it is
> not clear how any issues having to do with that would cause sudden
> eviction of data already *in* the page cache (on whichever node)..

For a page-cache-heavy workload where you care a lot more about things
being _in_ cache rather than having good NUMA locality, you probably
want "zone_reclaim_mode" set to 0:

	http://www.kernel.org/doc/Documentation/sysctl/vm.txt

That'll be a bit more comprehensive than messing with numactl.  It
really is the best thing if you just don't care about NUMA latencies all
that much.  What kind of hardware is this, btw?

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ