linux-kernel - Re: Sudden and massive page cache eviction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20101125011848.GB29511@hostway.ca>
Date:	Wed, 24 Nov 2010 17:18:48 -0800
From:	Simon Kirby <sim@...tway.ca>
To:	Peter Sch??ller <scode@...tify.com>
Cc:	Pekka Enberg <penberg@...nel.org>,
	Dave Hansen <dave@...ux.vnet.ibm.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org,
	Mattias de Zalenski <zalenski@...tify.com>,
	linux-mm@...ck.org
Subject: Re: Sudden and massive page cache eviction

On Wed, Nov 24, 2010 at 04:32:39PM +0100, Peter Sch??ller wrote:

> >> I forgot to address the second part of this question: How would I best
> >> inspect whether the kernel is doing that?
> >
> > You can, for example, record
> >
> > ??cat /proc/meminfo | grep Huge
> >
> > for large page allocations.
> 
> Those show zero a per my other post. However I got the impression Dave
> was asking about regular but larger-than-one-page allocations internal
> to the kernel, while the Huge* lines in /proc/meminfo refers to
> allocations specifically done by userland applications doing huge page
> allocation on a system with huge pages enabled - or am I confused?

Your page cache dents don't seem quite as big, so it may be something
else, but if it's the same problem we're seeing here, it seems to have to
do with when an order=3 new_slab allocation comes in to grows the kmalloc
slab cache for an __alloc_skb (network packet).  This is normal even
without jumbo frames now.  When there are no zones with order=3
zone_watermark_ok(), kswapd is woken, which frees things all over the
place to try to get zone_watermark_ok(order=3) to be happy.

We're seeing this throw out a huge number of pages, and we're seeing it
happen even with lots of memory free in the zone.  CONFIG_COMPACTION also
currently does not help because try_to_compact_pages() returns early with
COMPACT_SKIPPED if order <= PAGE_ALLOC_COSTLY_ORDER, and, you guessed it,
PAGE_ALLOC_COSTLY_ORDER is set to 3.

I reimplemented zone_pages_ok(order=3) in userspace, and I can see it
happen:

Code here: http://0x.ca/sim/ref/2.6.36/buddyinfo_scroll

  Zone order:0      1     2     3    4 5 6 7 8 9 A nr_free state

 DMA32   19026  33652  4897    13    5 1 2 0 0 0 0  106262 337 <= 256
Normal     450      0     0     0    0 0 0 0 0 0 0     450 -7 <= 238
 DMA32   19301  33869  4665    12    5 1 2 0 0 0 0  106035 329 <= 256
Normal     450      0     0     0    0 0 0 0 0 0 0     450 -7 <= 238
 DMA32   19332  33931  4603     9    5 1 2 0 0 0 0  105918 305 <= 256
Normal     450      0     0     0    0 0 0 0 0 0 0     450 -7 <= 238
 DMA32   19467  34057  4468     6    5 1 2 0 0 0 0  105741 281 <= 256
Normal     450      0     0     0    0 0 0 0 0 0 0     450 -7 <= 238
 DMA32   19591  34181  4344     5    5 1 2 0 0 0 0  105609 273 <= 256
Normal     450      0     0     0    0 0 0 0 0 0 0     450 -7 <= 238
 DMA32   19856  34348  4109     2    5 1 2 0 0 0 0  105244 249 <= 256 !!!
Normal     450      0     0     0    0 0 0 0 0 0 0     450 -7 <= 238
 DMA32   24088  36476  5437   144    5 1 2 0 0 0 0  120180 1385 <= 256
Normal    1024      1     0     0    0 0 0 0 0 0 0    1026 -5 <= 238
 DMA32   26453  37440  6676   623   53 1 2 0 0 0 0  134029 5985 <= 256
Normal    8700    100     0     0    0 0 0 0 0 0 0    8900 193 <= 238
 DMA32   48881  38161  7142   966   81 1 2 0 0 0 0  162955 9177 <= 256
Normal    8936    102     0     1    0 0 0 0 0 0 0    9148 205 <= 238
 DMA32   66046  40051  7871  1409  135 2 2 0 0 0 0  191256 13617 <= 256
Normal    9019     18     0     0    0 0 0 0 0 0 0    9055 29 <= 238
 DMA32   67133  48671  8231  1578  143 2 2 0 0 0 0  212503 15097 <= 256

So, kswapd was woken up at the line that ends in "!!!" there, because
free_pages(249) <= min(256), and so zone_watermark_ok() returned 0, when
an order=3 allocation came in.

Maybe try out that script and see if you see something similar.

Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/