linux-kernel - Re: [RFC 2/4] mm: replace TIF_MEMDIE checks by tsk_is_oom

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Sun, 4 Sep 2016 10:49:52 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:     mhocko@...nel.org, linux-mm@...ck.org
Cc:     rientjes@...gle.com, hannes@...xchg.org, akpm@...ux-foundation.org,
        linux-kernel@...r.kernel.org, mhocko@...e.com
Subject: Re: [RFC 2/4] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim

Michal Hocko wrote:
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9ee178ba7b71..df58733ca48e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1899,7 +1899,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  	 * bypass the last charges so that they can exit quickly and
>  	 * free their memory.
>  	 */
> -	if (unlikely(test_thread_flag(TIF_MEMDIE) ||
> +	if (unlikely(tsk_is_oom_victim(current) ||
>  		     fatal_signal_pending(current) ||
>  		     current->flags & PF_EXITING))
>  		goto force;

Does this test_thread_flag(TIF_MEMDIE) (or tsk_is_oom_victim(current)) make sense?

If current thread is OOM-killed, SIGKILL must be pending before arriving at
do_exit() and PF_EXITING must be set after arriving at do_exit(). But I can't
find locations which do memory allocation between clearing SIGKILL and setting
PF_EXITING.

When can test_thread_flag(TIF_MEMDIE) == T (or tsk_is_oom_victim(current) == T) &&
fatal_signal_pending(current) == F && (current->flags & PF_EXITING) == 0 happen?

> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index b11977585c7b..e26529edcee3 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -1078,7 +1078,7 @@ void pagefault_out_of_memory(void)
>  		 * be a racing OOM victim for which oom_killer_disable()
>  		 * is waiting for.
>  		 */
> -		WARN_ON(test_thread_flag(TIF_MEMDIE));
> +		WARN_ON(tsk_is_oom_victim(current));
>  	}
>  
>  	mutex_unlock(&oom_lock);

Does this WARN_ON() make sense?

When some user thread called oom_killer_disable(), there are running
kernel threads but is no running user threads except the one which
called oom_killer_disable(). Since oom_killer_disable() waits for
oom_lock, out_of_memory() called from here shall not return false
before oom_killer_disable() sets oom_killer_disabled = true. Thus,
possible situation out_of_memory() called from here can return false
are limited to

 (a) the one which triggered pagefaults after returning from
     oom_killer_disable()
 (b) An OOM victim which was thawed triggered pagefaults from do_exit()
     after the one which called oom_killer_disable() released oom_lock
 (c) kernel threads which triggered pagefaults after use_mm()

. And since kernel threads are not subjected to mark_oom_victim(),
test_thread_flag(TIF_MEMDIE) == F (or tsk_is_oom_victim(current) == F)
for kernel threads. Thus, possible situation out_of_memory() called from
here can return false and we hit this WARN_ON() is limited to (b).

Even if (a) or (b) is possible, does continuously emitting backtraces
help? It seems to me that the system is under OOM livelock situation and
we need to take a different action (e.g. try to allocate a page using
ALLOC_NO_WATERMARKS in order to make the pagefault be solved, and panic()
if failed) than emitting same backtraces forever.