[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201603120149.JEI86913.JVtSOOFHMFFQOL@I-love.SAKURA.ne.jp>
Date: Sat, 12 Mar 2016 01:49:26 +0900
From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To: mhocko@...nel.org
Cc: akpm@...ux-foundation.org, torvalds@...ux-foundation.org,
hannes@...xchg.org, mgorman@...e.de, rientjes@...gle.com,
hillf.zj@...baba-inc.com, kamezawa.hiroyu@...fujitsu.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/3] OOM detection rework v4
Michal Hocko wrote:
> On Fri 11-03-16 22:32:02, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Fri 11-03-16 19:45:29, Tetsuo Handa wrote:
> > > > (Posting as a reply to this thread.)
> > >
> > > I really do not see how this is related to this thread.
> >
> > All allocating tasks are looping at
> >
> > /*
> > * If we didn't make any progress and have a lot of
> > * dirty + writeback pages then we should wait for
> > * an IO to complete to slow down the reclaim and
> > * prevent from pre mature OOM
> > */
> > if (!did_some_progress && 2*(writeback + dirty) > reclaimable) {
> > congestion_wait(BLK_RW_ASYNC, HZ/10);
> > return true;
> > }
> >
> > in should_reclaim_retry().
> >
> > should_reclaim_retry() was added by OOM detection rework, wan't it?
>
> What happens without this patch applied. In other words, it all smells
> like the IO got stuck somewhere and the direct reclaim cannot perform it
> so we have to wait for the flushers to make a progress for us. Are those
> stuck? Is the IO making any progress at all or it is just too slow and
> it would finish actually. Wouldn't we just wait somewhere else in the
> direct reclaim path instead.
As of next-20160311, CPU usage becomes 0% when this problem occurs.
If I remove
mm-use-watermak-checks-for-__gfp_repeat-high-order-allocations-checkpatch-fixes
mm: use watermark checks for __GFP_REPEAT high order allocations
mm: throttle on IO only when there are too many dirty and writeback pages
mm-oom-rework-oom-detection-checkpatch-fixes
mm, oom: rework oom detection
then CPU usage becomes 60% and most of allocating tasks
are looping at
/*
* Acquire the oom lock. If that fails, somebody else is
* making progress for us.
*/
if (!mutex_trylock(&oom_lock)) {
*did_some_progress = 1;
schedule_timeout_uninterruptible(1);
return NULL;
}
in __alloc_pages_may_oom() (i.e. OOM-livelock due to the OOM reaper disabled).
Powered by blists - more mailing lists