linux-kernel - Re: [PATCH 2/6] mm,oom: don't abort on exiting processes when selecting a victim.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <201602182021.EEH86916.JOLtFFVHOOMQFS@I-love.SAKURA.ne.jp>
Date:	Thu, 18 Feb 2016 20:21:01 +0900
From:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:	mhocko@...nel.org
Cc:	akpm@...ux-foundation.org, rientjes@...gle.com, mgorman@...e.de,
	oleg@...hat.com, torvalds@...ux-foundation.org, hughd@...gle.com,
	andrea@...nel.org, riel@...hat.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/6] mm,oom: don't abort on exiting processes when selecting a victim.

Michal Hocko wrote:
> > We want to teach the OOM reaper to
> > operate whenever TIF_MEMDIE is set. But this means that we want
> > mm_is_reapable() check because there might be !SIGKILL && !PF_EXITING
> > threads when we run these optimized paths.
>
> > We will need to use timer if mm_is_reapable() == false after all.
>
> Or we should re-evaluate those heuristics for multithreaded processes.

TIF_MEMDIE heuristics are per a task_struct basis but OOM-kill operation
is per a signal_struct basis or per a mm_struct basis.

Since we set TIF_MEMDIE to only one thread (with a wrong assumption that
remaining threads will get TIF_MEMDIE due to fatal_signal_pending()),
we are bothered by corner cases.

> Does it even make sense to shortcut and block the OOM killer if the
> single thread is exiting?

Do we check for clone(!CLONE_SIGHAND && CLONE_VM) threads (i.e. walk the
process list) for checking whether it is really a single thread?
That would be mm_is_reapable().

>                           Only very small amount of memory gets released
> during its exit anyway.

Currently exit_mm() is called before exit_files() etc. are called.
Can we expect a single page of memory being released when such thread
gets stuck at down_read(&mm->mmap_sem) ?

>                         Don't we want to catch only the group exit to
> catch fatal_signal_pending -> exit_signals -> exit_mm -> allocation
> cases? I am not really sure what to check for, to be honest though.
>

I don't know what this line is saying.

> > Why don't you accept timer based workaround now, even if you have a plan
> > to update the OOM reaper for handling these optimized paths?
>
> Because I believe that the timeout based solutions are distracting from
> a proper solution which would be based on actual algorithm/heurstic that
> can be measured and evaluated. And because I can see future discussion
> of whether $FOO or $BAR is a better timeout... I really do not see any
> reason to rush into quick solutions now.

OOM-livelock bugs are caused by over-throttling based on optimistic
assumptions. This [PATCH 5/6] patch is for unthrottling in order to
guarantee forward progress (and eventually trigger kernel panic if
there is no more OOM-killable processes).

I can't see future discussion of whether $FOO or $BAR is a better timeout
because timeout based unthrottling should seldom occur. Even without the
OOM reaper, more than e.g. 99% of innocent OOM events would successfully
solve the OOM condition before this timeout expires. After we merge the
OOM reaper, more than e.g. 99% of malicious OOM events would successfully
solve the OOM condition before this timeout expires. Who can gather data
for discussing whether $FOO or $BAR is a better timeout? Only those who want
to explore this e.g. 1% possibility and those who hate any timeout would
want to disable this timeout.

If we make sure that timeout based unthrottling guarantees forward
progress, we can try to utilize memory reserves more aggressively.
For example, we can set TIF_MEMDIE on all fatal_signal_pending() threads
using a mm_struct chosen by the OOM killer. This will eliminate a wrong
assumption that remaining threads will get TIF_MEMDIE due to
fatal_signal_pending(). We had been too cowardly about use of memory
reserves because currently we have no means to refill the memory reserves.
If timeout based unthrottling kills next OOM victim (and the OOM reaper
reaps it), we can overcommit memory reserves (like we overcommit normal
memory).

I don't think we can manage without timeout based solutions.
I really do not see any reason not to accept [PATCH 5/6] now.