linux-kernel - Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <201712072000.FCE30281.FOFHOOtVMQLJFS@I-love.SAKURA.ne.jp>
Date:   Thu, 7 Dec 2017 20:00:58 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:     mhocko@...nel.org
Cc:     rientjes@...gle.com, akpm@...ux-foundation.org,
        aarcange@...hat.com, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap

Michal Hocko wrote:
> Hmm, so you are creating a separate process (from the signal point of
> view) and I suspect it is one of those that holds the last reference to
> the mm_struct which is released here and it has tsk_oom_victim = F

Right.

> So we need a more robust test for the oom victim. Your suggestion is
> basically what I came up with originally [1] and which was deemed
> ineffective because we took the mmap_sem even for regular paths and
> Kirill was afraid this adds some unnecessary cycles to the exit path
> which is quite hot.
> 
> So I guess we have to do something else instead. We have to store the
> oom flag to the mm struct as well. Something like the patch below.

Yes, adding a new flag for this purpose will work.

Also, setting MMF_UNSTABLE flag between after sending SIGKILL and before
victim->mm becomes NULL and testing MMF_UNSTABLE at exit_mm() should work.

But I prefer simple revert + mmget()/mmput_async() approach at
http://lkml.kernel.org/r/201712062037.DAF90168.SVFQOJFMOOtHLF@I-love.SAKURA.ne.jp , for
my approach not only saves lines but also fixes unexpected change for nommu at
http://lkml.kernel.org/r/201711091949.BDB73475.OSHFOMQtLFOFVJ@I-love.SAKURA.ne.jp .
Also, if we replace asynchronous OOM reaping by the OOM reaper kernel thread with
synchronous OOM reaping by the OOM killer, we can close MMF_OOM_SKIP race window
because it is guaranteed that __oom_reap_task_mm() is called before __mmput() is
called.