[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201509260114.ADI35946.OtHOVFOMJQFLFS@I-love.SAKURA.ne.jp>
Date: Sat, 26 Sep 2015 01:14:50 +0900
From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To: mhocko@...nel.org, rientjes@...gle.com
Cc: oleg@...hat.com, torvalds@...ux-foundation.org, kwalker@...hat.com,
cl@...ux.com, akpm@...ux-foundation.org, hannes@...xchg.org,
vdavydov@...allels.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, skozina@...hat.com
Subject: Re: can't oom-kill zap the victim's memory?
Michal Hocko wrote:
> On Thu 24-09-15 14:15:34, David Rientjes wrote:
> > > > Finally. Whatever we do, we need to change oom_kill_process() first,
> > > > and I think we should do this regardless. The "Kill all user processes
> > > > sharing victim->mm" logic looks wrong and suboptimal/overcomplicated.
> > > > I'll try to make some patches tomorrow if I have time...
> > >
> > > That would be appreciated. I do not like that part either. At least we
> > > shouldn't go over the whole list when we have a good chance that the mm
> > > is not shared with other processes.
> > >
> >
> > Heh, it's actually imperative to avoid livelocking based on mm->mmap_sem,
> > it's the reason the code exists. Any optimizations to that is certainly
> > welcome, but we definitely need to send SIGKILL to all threads sharing the
> > mm to make forward progress, otherwise we are going back to pre-2008
> > livelocks.
>
> Yes but mm is not shared between processes most of the time. CLONE_VM
> without CLONE_THREAD is more a corner case yet we have to crawl all the
> task_structs for _each_ OOM killer invocation. Yes this is an extreme
> slow path but still might take quite some unnecessarily time.
Excuse me, but thinking about CLONE_VM without CLONE_THREAD case...
Isn't there possibility of hitting livelocks at
/*
* If current has a pending SIGKILL or is exiting, then automatically
* select it. The goal is to allow it to allocate so that it may
* quickly exit and free its memory.
*
* But don't select if current has already released its mm and cleared
* TIF_MEMDIE flag at exit_mm(), otherwise an OOM livelock may occur.
*/
if (current->mm &&
(fatal_signal_pending(current) || task_will_free_mem(current))) {
mark_oom_victim(current);
return true;
}
if current thread receives SIGKILL just before reaching here, for we don't
send SIGKILL to all threads sharing the mm?
Hopefully current thread is not holding inode->i_mutex because reaching here
(i.e. calling out_of_memory()) suggests that we are doing GFP_KERNEL
allocation. But it could be !__GFP_NOFS && __GFP_NOFAIL allocation, or
different locks contended by another thread sharing the mm?
I don't like "That thread will now get access to memory reserves since it
has a pending fatal signal." line in comments for the "Kill all user
processes sharing victim->mm" logic. That thread won't get access to memory
reserves unless that thread can call out_of_memory() (i.e. doing __GFP_FS or
__GFP_NOFAIL allocations). Since I can observe that that thread may be doing
!__GFP_NOFS allocation, I think that this comment needs to be updated.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists