[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1108291611070.32495@chino.kir.corp.google.com>
Date: Mon, 29 Aug 2011 16:17:55 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: Oleg Nesterov <oleg@...hat.com>, Ying Han <yinghan@...gle.com>
cc: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
roland@...k.frob.com, tj@...nel.org, dvlasenk@...hat.com,
matt.fleming@...ux.intel.com, linux-kernel@...r.kernel.org,
avagin@...nvz.org, fhrbata@...hat.com
Subject: Re: mm->oom_disable_count is broken
On Mon, 29 Aug 2011, Oleg Nesterov wrote:
> > IIRC, I did pointed out this issue. But nobody replied.
> > I think ->oom_disable_count is currently broken. but now I have no time to
> > audit this stuff. So, I'd suggest to revert this code if nobody don't fix it.
>
> I tend to agree, of course we can fix oom_disable_count but I don't
> really understand why do we want it.
>
I'd rather just remove it entirely, we'll have to ask it's author. Ying,
do you see a reason to keep oom_disable_count around?
The only thing that I can see it doing is preventing a thread that shares
an ->mm with an unkillable thread from being killed itself since it won't
lead to future memory freeing. It prevents the second tasklist iteration
after a task has been chosen to check if another thread sharing the memory
cannot be killed.
I'd rather just kill the thread anyway because there's a chance that the
OOM_DISABLE thread is waiting on it and may free its memory as well and
there's no guarantee that when you set a thread to be OOM_DISABLE that all
threads sharing the same memory are disabled as well.
> And. personally I dislike it because ->oom_disable_count is just another
> proof that ->oom_score_adj should be in ->mm, not per-process. IIRC,
> you already explained me why we can't do this, but - sorry - I forgot.
> May be something with vfork... Could you explain this again?
>
I actually really wanted oom_score_adj to be in the ->mm, it would
simplify a lot of the code :) The problem was the inheritance property:
we expect a job scheduler that is OOM_DISABLE to be able to vfork, change
the oom_score_adj of the child, and then exec so that it is not oom
disabled before starting to allocate memory. If this were in the mm, then
setting the oom_score_adj of the child prior to exec would change the job
scheduler's oom score as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists