linux-kernel - Re: next-20200515: Xorg killed due to "OOM"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20200601093142.GE1161@dhcp22.suse.cz>
Date:   Mon, 1 Jun 2020 11:31:42 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Pavel Machek <pavel@....cz>
Cc:     kernel list <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...l.org>, airlied@...ux.ie,
        daniel@...ll.ch, dri-devel@...ts.freedesktop.org,
        tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, x86@...nel.org,
        hpa@...or.com
Subject: Re: next-20200515: Xorg killed due to "OOM"

On Sun 31-05-20 14:16:01, Pavel Machek wrote:
> On Thu 2020-05-28 14:07:50, Michal Hocko wrote:
> > On Thu 28-05-20 14:03:54, Pavel Machek wrote:
> > > On Thu 2020-05-28 11:05:17, Michal Hocko wrote:
> > > > On Tue 26-05-20 11:10:54, Pavel Machek wrote:
> > > > [...]
> > > > > [38617.276517] oom_reaper: reaped process 31769 (chromium), now anon-rss:0kB, file-rss:0kB, shmem-rss:7968kB
> > > > > [38617.277232] Xorg invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
> > > > > [38617.277247] CPU: 0 PID: 2978 Comm: Xorg Not tainted 5.7.0-rc5-next-20200515+ #117
> > > > > [38617.277256] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011
> > > > > [38617.277266] Call Trace:
> > > > > [38617.277286]  dump_stack+0x54/0x6e
> > > > > [38617.277300]  dump_header+0x45/0x321
> > > > > [38617.277313]  oom_kill_process.cold+0x9/0xe
> > > > > [38617.277324]  ? out_of_memory+0x167/0x420
> > > > > [38617.277336]  out_of_memory+0x1f2/0x420
> > > > > [38617.277348]  pagefault_out_of_memory+0x34/0x56
> > > > > [38617.277361]  mm_fault_error+0x4a/0x130
> > > > > [38617.277372]  do_page_fault+0x3ce/0x416
> > > > 
> > > > The reason the OOM killer has been invoked is that the page fault
> > > > handler has returned VM_FAULT_OOM. So this is not a result of the page
> > > > allocator struggling to allocate a memory. It would be interesting to
> > > > check which code path has returned this. 
> > > 
> > > Should the core WARN_ON if that happens and there's enough memory, or
> > > something like that?
> > 
> > I wish it would simply go away. There shouldn't be really any reason for
> > VM_FAULT_OOM to exist. The real low on memory situation is already
> > handled in the page allocator.
> 
> Umm. Maybe the WARN_ON is first step in that direction? So we can see
> what driver actually did that, and complain to its authors?

This is much harder done than it seems. But maybe this doesn't really
need a full coverage. Some of the code paths which return VM_FAULT_OOM
will simply not fail. But checking for vma->vm_ops->fault() failures
might be interesting. Does the following tell you more about the failure
you can see

diff --git a/mm/memory.c b/mm/memory.c
index 9ab00dcb95d4..5ff023ab7b49 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3442,8 +3442,11 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
 
 	ret = vma->vm_ops->fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
-			    VM_FAULT_DONE_COW)))
+			    VM_FAULT_DONE_COW))) {
+		if (unlikely(ret & VM_FAULT_OOM))
+			pr_warn("VM_FAULT_OOM returned from %ps\n", vma->vm_ops->fault);
 		return ret;
+	}
 
 	if (unlikely(PageHWPoison(vmf->page))) {
 		if (ret & VM_FAULT_LOCKED)

-- 
Michal Hocko
SUSE Labs