[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171030141815.lk76bfetmspf7f4x@dhcp22.suse.cz>
Date: Mon, 30 Oct 2017 15:18:15 +0100
From: Michal Hocko <mhocko@...nel.org>
To: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
Andrea Arcangeli <aarcange@...hat.com>,
David Rientjes <rientjes@...gle.com>,
Johannes Weiner <hannes@...xchg.org>,
Manish Jaggi <mjaggi@...iumnetworks.com>,
Mel Gorman <mgorman@...e.de>, Oleg Nesterov <oleg@...hat.com>,
Vladimir Davydov <vdavydov.dev@...il.com>,
Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [PATCH] mm,oom: Try last second allocation before and after
selecting an OOM victim.
On Sat 28-10-17 17:07:09, Tetsuo Handa wrote:
> This patch splits last second allocation attempt into two locations, once
> before selecting an OOM victim and again after selecting an OOM victim,
> and uses normal watermark for last second allocation attempts.
Why do we need both?
> As of linux-2.6.11, nothing prevented from concurrently calling
> out_of_memory(). TIF_MEMDIE test in select_bad_process() tried to avoid
> needless OOM killing. Thus, it was safe to do __GFP_DIRECT_RECLAIM
> allocation (apart from which watermark should be used) just before
> calling out_of_memory().
>
> As of linux-2.6.24, try_set_zone_oom() was added to
> __alloc_pages_may_oom() by commit ff0ceb9deb6eb017 ("oom: serialize out
> of memory calls") which effectively started acting as a kind of today's
> mutex_trylock(&oom_lock).
>
> As of linux-4.2, try_set_zone_oom() was replaced with oom_lock by
> commit dc56401fc9f25e8f ("mm: oom_kill: simplify OOM killer locking").
> At least by this time, it became no longer safe to do
> __GFP_DIRECT_RECLAIM allocation with oom_lock held.
>
> And as of linux-4.13, last second allocation attempt stopped using
> __GFP_DIRECT_RECLAIM by commit e746bf730a76fe53 ("mm,page_alloc: don't
> call __node_reclaim() with oom_lock held.").
>
> Therefore, there is no longer valid reason to use ALLOC_WMARK_HIGH for
> last second allocation attempt [1].
Another reason to use the high watermark as explained by Andrea was
"
: Elaborating the comment: the reason for the high wmark is to reduce
: the likelihood of livelocks and be sure to invoke the OOM killer, if
: we're still under pressure and reclaim just failed. The high wmark is
: used to be sure the failure of reclaim isn't going to be ignored. If
: using the min wmark like you propose there's risk of livelock or
: anyway of delayed OOM killer invocation.
"
How is that affected by changes in locking you discribe above?
> And this patch changes to do normal
> allocation attempt, with handling of ALLOC_OOM added in order to mitigate
> extra OOM victim selection problem reported by Manish Jaggi [2].
>
> Doing really last second allocation attempt after selecting an OOM victim
> will also help the OOM reaper to start reclaiming memory without waiting
> for oom_lock to be released.
The changelog is much more obscure than it really needs to be. You fail
to explain _why_ we need this and and _what_ the actual problem is. You
are simply drowning in details here (btw. this is not the first time
your changelog has this issues). Try to focus on _what_ is the problem
_why_ do we care and _how_ are you addressing it.
[...]
> +struct page *alloc_pages_before_oomkill(struct oom_control *oc)
> +{
> + /*
> + * Make sure that this allocation attempt shall not depend on
> + * __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocation, for the caller is
> + * already holding oom_lock.
> + */
> + const gfp_t gfp_mask = oc->gfp_mask & ~__GFP_DIRECT_RECLAIM;
> + struct alloc_context *ac = oc->ac;
> + unsigned int alloc_flags = gfp_to_alloc_flags(gfp_mask);
> + const int reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
> +
> + /* Need to update zonelist if selected as OOM victim. */
> + if (reserve_flags) {
> + alloc_flags = reserve_flags;
> + ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> + ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
> + ac->high_zoneidx, ac->nodemask);
> + }
Why do we need this zone list rebuilding?
> + return get_page_from_freelist(gfp_mask, oc->order, alloc_flags, ac);
> +}
> +
> static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
> int preferred_nid, nodemask_t *nodemask,
> struct alloc_context *ac, gfp_t *alloc_mask,
> --
> 1.8.3.1
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists