linux-kernel - Re: [patch 08/12] mm: page_alloc: wait for OOM killer progress before retrying

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150326112841.GD18560@cmpxchg.org>
Date:	Thu, 26 Mar 2015 07:28:41 -0400
From:	Johannes Weiner <hannes@...xchg.org>
To:	Vlastimil Babka <vbabka@...e.cz>
Cc:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
	linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, torvalds@...ux-foundation.org,
	akpm@...ux-foundation.org, ying.huang@...el.com,
	aarcange@...hat.com, david@...morbit.com, mhocko@...e.cz,
	tytso@....edu
Subject: Re: [patch 08/12] mm: page_alloc: wait for OOM killer progress
 before retrying

On Wed, Mar 25, 2015 at 06:01:48PM +0100, Vlastimil Babka wrote:
> On 03/25/2015 03:15 PM, Tetsuo Handa wrote:
> >Johannes Weiner wrote:
> >>diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> >>index 5cfda39b3268..e066ac7353a4 100644
> >>--- a/mm/oom_kill.c
> >>+++ b/mm/oom_kill.c
> >>@@ -711,12 +711,15 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
> >>  		killed = 1;
> >>  	}
> >>  out:
> >>+	if (test_thread_flag(TIF_MEMDIE))
> >>+		return true;
> >>  	/*
> >>-	 * Give the killed threads a good chance of exiting before trying to
> >>-	 * allocate memory again.
> >>+	 * Wait for any outstanding OOM victims to die.  In rare cases
> >>+	 * victims can get stuck behind the allocating tasks, so the
> >>+	 * wait needs to be bounded.  It's crude alright, but cheaper
> >>+	 * than keeping a global dependency tree between all tasks.
> >>  	 */
> >>-	if (killed)
> >>-		schedule_timeout_killable(1);
> >>+	wait_event_timeout(oom_victims_wait, !atomic_read(&oom_victims), HZ);
> >>
> >>  	return true;
> >>  }
> >
> >out_of_memory() returning true with bounded wait effectively means that
> >wait forever without choosing subsequent OOM victims when first OOM victim
> >failed to die. The system will lock up, won't it?
> 
> And after patch 12, does this mean that you may not be waiting long enough
> for the victim to die, before you fail the allocation, prematurely? I can
> imagine there would be situations where the victim is not deadlocked, but
> still take more than HZ to finish, no?

Arguably it should be reasonable to fail allocations once the OOM
victim is stuck for over a second and the OOM reserves have been
depleted.

On the other hand, we don't need to play it that tight, because that
timeout is only targetted for the victim-blocked-on-alloc situations
which aren't all that common.  Something like 5 seconds should still
be okay.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/