linux-kernel - Re: [PATCH 0/10 -v3] Handle oom bypass more gracefully

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <201606080649.DGF51523.FLMOSHVtFFOJOQ@I-love.SAKURA.ne.jp>
Date:	Wed, 8 Jun 2016 06:49:24 +0900
From:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:	mhocko@...nel.org
Cc:	linux-mm@...ck.org, rientjes@...gle.com, oleg@...hat.com,
	vdavydov@...allels.com, akpm@...ux-foundation.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/10 -v3] Handle oom bypass more gracefully

Michal Hocko wrote:
> OK, so you are arming the timer for each mark_oom_victim regardless
> of the oom context. This means that you have replaced one potential
> lockup by other potential livelocks. Tasks from different oom domains
> might interfere here...
> 
> Also this code doesn't even seem easier. It is surely less lines of
> code but it is really hard to realize how would the timer behave for
> different oom contexts.

If you worry about interference, we can use per signal_struct timestamp.
I used per task_struct timestamp in my earlier versions (where per
task_struct TIF_MEMDIE check was used instead of per signal_struct
oom_victims).

> > What is wrong with above patch? How much difference is there compared to
> > calling schedule_timeout_killable(HZ) in oom_kill_process() before
> > releasing oom_lock and later checking MMF_OOM_REAPED after re-taking
> > oom_lock when we can't wake up the OOM reaper?
> 
> I fail to see how much this is different, really. Your patch is checking
> timer_pending with a global context in the same path and that is imho
> much harder to argue about than something which is task->mm based.

We can use per signal_struct or per task_struct timestamp if you don't
like global timestamp.

> > I'm OK with "a decision based by a feedback" but you don't like waking up
> > the OOM reaper ("invoking the oom reaper just to find out what we know
> > already and it is unlikely to change after oom_kill_process just doesn't
> > make much sense."). So what feedback mechanisms are possible other than
> > timeout like above patch?
> 
> Is this about the patch 10? Well, yes, there is a case where oom reaper
> cannot be invoked and we have no feedback. Then we have no other way
> than to wait for some time. I believe it is easier to wait in the oom
> context directly than to add a global timer. Both approaches would need
> some code in the oom victim selection code and it is much easier to
> argue about the victim specific context than a global one as mentioned
> above.

But expiring timeout by sleeping inside oom_kill_process() prevents other
threads which are OOM-killed from obtaining TIF_MEMDIE, for anybody needs
to wait for oom_lock in order to obtain TIF_MEMDIE. Unless you set
TIF_MEMDIE to all OOM-killed threads from oom_kill_process() or allow
the caller context to use ALLOC_NO_WATERMARKS by checking whether current
was already OOM-killed rather than TIF_MEMDIE, attempt to expiring timeout
by sleeping inside oom_kill_process() is useless.