linux-kernel - Re: [PATCH 0/3] OOM detection rework v4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160302140611.GI26686@dhcp22.suse.cz>
Date:	Wed, 2 Mar 2016 15:06:11 +0100
From:	Michal Hocko <mhocko@...nel.org>
To:	Joonsoo Kim <js1304@...il.com>
Cc:	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Hugh Dickins <hughd@...gle.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Mel Gorman <mgorman@...e.de>,
	David Rientjes <rientjes@...gle.com>,
	Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
	Hillf Danton <hillf.zj@...baba-inc.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Linux Memory Management List <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Subject: Re: [PATCH 0/3] OOM detection rework v4

On Wed 02-03-16 22:32:09, Joonsoo Kim wrote:
> 2016-03-02 18:50 GMT+09:00 Michal Hocko <mhocko@...nel.org>:
> > On Wed 02-03-16 11:19:54, Joonsoo Kim wrote:
> >> On Mon, Feb 29, 2016 at 10:02:13PM +0100, Michal Hocko wrote:
> > [...]
> >> > > + /*
> >> > > +  * OK, so the watermak check has failed. Make sure we do all the
> >> > > +  * retries for !costly high order requests and hope that multiple
> >> > > +  * runs of compaction will generate some high order ones for us.
> >> > > +  *
> >> > > +  * XXX: ideally we should teach the compaction to try _really_ hard
> >> > > +  * if we are in the retry path - something like priority 0 for the
> >> > > +  * reclaim
> >> > > +  */
> >> > > + if (order && order <= PAGE_ALLOC_COSTLY_ORDER)
> >> > > +         return true;
> >> > > +
> >> > >   return false;
> >>
> >> This seems not a proper fix. Checking watermark with high order has
> >> another meaning that there is high order page or not. This isn't
> >> what we want here.
> >
> > Why not? Why should we retry the reclaim if we do not have >=order page
> > available? Reclaim itself doesn't guarantee any of the freed pages will
> > form the requested order. The ordering on the LRU lists is pretty much
> > random wrt. pfn ordering. On the other hand if we have a page available
> > which is just hidden by watermarks then it makes perfect sense to retry
> > and free even order-0 pages.
> 
> If we have >= order page available, we would not reach here. We would
> just allocate it.

not really, we can still be under the low watermark. Note that the
target for the should_reclaim_retry watermark check includes also the
reclaimable memory.
 
> And, should_reclaim_retry() is not just for reclaim. It is also for
> retrying compaction.
> 
> That watermark check is to check further reclaim/compaction
> is meaningful. And, for high order case, if there is enough freepage,
> compaction could make high order page even if there is no high order
> page now.
> 
> Adding freeable memory and checking watermark with it doesn't help
> in this case because number of high order page isn't changed with it.
> 
> I just did quick review to your patches so maybe I am wrong.
> Am I missing something?

The core idea behind should_reclaim_retry is to check whether the
reclaiming all the pages would help to get over the watermark and there
is at least one >= order page. Then it really makes sense to retry. As
the compaction has already was performed before this is called we should
have created some high order pages already. The decay guarantees that we
eventually trigger the OOM killer after some attempts.

If the compaction can backoff and ignore our requests then we are
screwed of course and that should be addressed imho at the compaction
layer. Maybe we can tell the compaction to try harder but I would like
to understand why this shouldn't be a default behavior for !costly
orders.
 
[...]
> >> > > @@ -3281,11 +3293,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >> > >           goto noretry;
> >> > >
> >> > >   /*
> >> > > -  * Costly allocations might have made a progress but this doesn't mean
> >> > > -  * their order will become available due to high fragmentation so do
> >> > > -  * not reset the no progress counter for them
> >> > > +  * High order allocations might have made a progress but this doesn't
> >> > > +  * mean their order will become available due to high fragmentation so
> >> > > +  * do not reset the no progress counter for them
> >> > >    */
> >> > > - if (did_some_progress && order <= PAGE_ALLOC_COSTLY_ORDER)
> >> > > + if (did_some_progress && !order)
> >> > >           no_progress_loops = 0;
> >> > >   else
> >> > >           no_progress_loops++;
> >>
> >> This unconditionally increases no_progress_loops for high order
> >> allocation, so, after 16 iterations, it will fail. If compaction isn't
> >> enabled in Kconfig, 16 times reclaim attempt would not be sufficient
> >> to make high order page. Should we consider this case also?
> >
> > How many retries would help? I do not think any number will work
> > reliably. Configurations without compaction enabled are asking for
> > problems by definition IMHO. Relying on order-0 reclaim for high order
> > allocations simply cannot work.
> 
> At least, reset no_progress_loops when did_some_progress. High
> order allocation up to PAGE_ALLOC_COSTLY_ORDER is as important
> as order 0. And, reclaim something would increase probability of
> compaction success.

This is something I still do not understand. Why would reclaiming
random order-0 pages help compaction? Could you clarify this please?

> Why do we limit retry as 16 times with no evidence of potential
> impossibility of making high order page?

If we tried to compact 16 times without any progress then this sounds
like a sufficient evidence to me. Well, this number is somehow arbitrary
but the main point is to limit it to _some_ number, if we can show that
a larger value would work better then we can update it of course.

> And, 16 retry looks not good to me because compaction could defer
> actual doing up to 64 times.

OK, this is something that needs to be handled in a better way. The
primary question would be why to defer the compaction for <=
PAGE_ALLOC_COSTLY_ORDER requests in the first place. I guess I do see
why it makes sense it for the best effort mode of operation but !costly
orders should be trying much harder as they are nofail, no?

Thanks!
-- 
Michal Hocko
SUSE Labs