linux-kernel - Re: OOM-Killer kills too much with 2.6.32.2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20100126183412.6AC9.A69D9226@jp.fujitsu.com>
Date:	Tue, 26 Jan 2010 20:07:43 +0900 (JST)
From:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To:	"Roman Jarosz" <kedgedev@...il.com>
Cc:	kosaki.motohiro@...fujitsu.com,
	lkml <linux-kernel@...r.kernel.org>,
	A Rojas <nqn1976list@...il.com>,
	Hugh Dickins <hugh.dickins@...cali.co.uk>,
	"A. Boulan" <arnaud.boulan@...ertysurf.fr>, michael@...nelt.co.at,
	jcnengel@...glemail.com, rientjes@...gle.com, earny@...4u.de,
	Jesse Barnes <jbarnes@...tuousgeek.org>,
	Eric Anholt <eric@...olt.net>,
	Chris Wilson <chris@...is-wilson.co.uk>
Subject: Re: OOM-Killer kills too much with 2.6.32.2

(Restore all cc and add Hugh and Chris)


> > Hi all,
> >
> > Strangely, all reproduce machine are x86_64 with Intel i915. but I don't
> > have any solid evidence.
> > Can anyone please apply following debug patch and reproduce this issue?
> >
> > this patch write some debug message into /var/log/messages.
> >
> 
> Here it is
> 
> Jan 26 09:34:32 kedge kernel: ->fault OOM shmem_fault 1 1
> Jan 26 09:34:32 kedge kernel: X invoked oom-killer: gfp_mask=0x0, order=0,
> oom_adj=0
> Jan 26 09:34:32 kedge kernel: Pid: 1927, comm: X Not tainted 2.6.33-rc5 #3


Very thank you!!

Current status and analysis are
  - OOM is invoked by VM_FAULT_OOM in page fault
  - GEM use lots shmem internally. i915 use GEM.
  - VM_FAULT_OOM is created by shmem.
  - shmem allocate some memory by using mapping_gfp_mask(inode->i_mapping).
    and if allocation failed, it can return -ENOMEM and -ENOMEM generate VM_FAULT_OOM.
  - But, GEM have following code.


drm_gem.c drm_gem_object_alloc()
--------------------
        obj->filp = shmem_file_setup("drm mm object", size, VM_NORESERVE);
(snip)
        /* Basically we want to disable the OOM killer and handle ENOMEM
         * ourselves by sacrificing pages from cached buffers.
         * XXX shmem_file_[gs]et_gfp_mask()
         */
        mapping_set_gfp_mask(obj->filp->f_path.dentry->d_inode->i_mapping,
                             GFP_HIGHUSER |
                             __GFP_COLD |
                             __GFP_FS |
                             __GFP_RECLAIMABLE |
                             __GFP_NORETRY |
                             __GFP_NOWARN |
                             __GFP_NOMEMALLOC);


This comment is lie. __GFP_NORETY cause ENOMEM to shmem, not GEM itself.
GEM can't handle nor recover it. I suspect following commit is wrong. 

----------------------------------------------------
commit 07f73f6912667621276b002e33844ef283d98203
Author: Chris Wilson <chris@...is-wilson.co.uk>
Date:   Mon Sep 14 16:50:30 2009 +0100

    drm/i915: Improve behaviour under memory pressure

    Due to the necessity of having to take the struct_mutex, the i915
    shrinker can not free the inactive lists if we fail to allocate memory
    whilst processing a batch buffer, triggering an OOM and an ENOMEM that
    is reported back to userspace. In order to fare better under such
    circumstances we need to manually retry a failed allocation after
    evicting inactive buffers.

    To do so involves 3 steps:
    1. Marking the backing shm pages as NORETRY.
    2. Updating the get_pages() callers to evict something on failure and then
       retry.
    3. Revamping the evict something logic to be smarter about the required
       buffer size and prefer to use volatile or clean inactive pages.

    Signed-off-by: Chris Wilson <chris@...is-wilson.co.uk>
    Signed-off-by: Jesse Barnes <jbarnes@...tuousgeek.org>
----------------------------------------------------


but unfortunatelly it can't revert easily.
So, Can you please try following partial revert patch?



>From a27115f93d4f3ff6538860e69a7b444761cef91b Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Date: Tue, 26 Jan 2010 19:51:57 +0900
Subject: [PATCH] Revert NORETRY

---
 drivers/gpu/drm/drm_gem.c |   13 -------------
 1 files changed, 0 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index e9dbb48..8bf3770 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -142,19 +142,6 @@ drm_gem_object_alloc(struct drm_device *dev, size_t size)
 	if (IS_ERR(obj->filp))
 		goto free;
 
-	/* Basically we want to disable the OOM killer and handle ENOMEM
-	 * ourselves by sacrificing pages from cached buffers.
-	 * XXX shmem_file_[gs]et_gfp_mask()
-	 */
-	mapping_set_gfp_mask(obj->filp->f_path.dentry->d_inode->i_mapping,
-			     GFP_HIGHUSER |
-			     __GFP_COLD |
-			     __GFP_FS |
-			     __GFP_RECLAIMABLE |
-			     __GFP_NORETRY |
-			     __GFP_NOWARN |
-			     __GFP_NOMEMALLOC);
-
 	kref_init(&obj->refcount);
 	kref_init(&obj->handlecount);
 	obj->size = size;
-- 
1.6.5.2




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/