lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <200910122244.19666.borntraeger@de.ibm.com>
Date:	Mon, 12 Oct 2009 22:44:19 +0200
From:	Christian Borntraeger <borntraeger@...ibm.com>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Elladan <elladan@...imo.com>, Nick Piggin <npiggin@...e.de>,
	Andi Kleen <andi@...stfloor.org>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Minchan Kim <minchan.kim@...il.com>
Subject: oomkiller over-ambitious after "vmscan: make mapped executable pages the first class citizen" (bisected)

I have seen some OOM-killer action on my s390x system when using large amounts 
of anonymous memory:

[cborntra@...lp34 ~]$ cat memeat.c
#include <sys/mman.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>

int main()
{
        char *start;
        char *a;
        start = mmap(NULL, 4300000000UL,
                    PROT_READ | PROT_WRITE,
                    MAP_SHARED | MAP_ANONYMOUS, -1 , 0);
        if (start == MAP_FAILED) {
                printf("cannot map guest memory\n");
                exit (1);
        }
        for (a = start; a < start + 4300000000UL; a += 4096)
            *a='a';
        exit(0);
}
[cborntra@...lp34 ~]$ ./memeat
Connection to t63lp34 closed.


I attached the dmesg with the oom messages.

As you can see we are failing several order 0 allocations with gfpmask=0x201da. 

The application uses slightly more memory than is available. The thing is, that 
there is plenty of swap space to fullfill the (non-atomic) request:

[cborntra@...lp34 ~]$ free
             total       used       free     shared    buffers     cached
Mem:       4166560     127148    4039412          0       2256      19752
-/+ buffers/cache:     105140    4061420
Swap:      9615904       8328    9607576

Since old kernels never showed OOM, I was able to bisect the first kernel that 
shows this behaviour:
commit 8cab4754d24a0f2e05920170c845bd84472814c6                                                                                                                             
Author: Wu Fengguang <fengguang.wu@...el.com>                                                                                                                               
    vmscan: make mapped executable pages the first class citizen

In fact, applying this patch makes the problem go away:
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -1345,22 +1345,8 @@ static void shrink_active_list(unsigned 
 
 		/* page_referenced clears PageReferenced */
 		if (page_mapping_inuse(page) &&
-		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
+		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
 			nr_rotated++;
-			/*
-			 * Identify referenced, file-backed active pages and
-			 * give them one more trip around the active list. So
-			 * that executable code get better chances to stay in
-			 * memory under moderate memory pressure.  Anon pages
-			 * are not likely to be evicted by use-once streaming
-			 * IO, plus JVM can create lots of anon VM_EXEC pages,
-			 * so we ignore them here.
-			 */
-			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
-				list_add(&page->lru, &l_active);
-				continue;
-			}
-		}
 
 		ClearPageActive(page);	/* we are de-activating */
 		list_add(&page->lru, &l_inactive);



the interesting part is, that s390x in the default configuration has no no-
execute feature, resulting in the following map 
c0000000-1c04cd000 rwxs 00000000 00:04 18517        /dev/zero (deleted)
As you can see, this area looks file mapped (/dev/zero) and executable. On the 
other hand, the !PageAnon clause should cover this case. I am lost.

Does anybody on the CC (taken from the original patch) has an idea what the 
problem is and how to fix this properly?

Christian

View attachment "dmesg.txt" of type "text/plain" (20462 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ