linux-kernel - Re: [RFC PATCH] mm, oom: allow oom reaper to race with exit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <201706272326.BAG00561.LMJVHSFQtOOFFO@I-love.SAKURA.ne.jp>
Date:   Tue, 27 Jun 2017 23:26:22 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:     mhocko@...nel.org
Cc:     linux-mm@...ck.org, rientjes@...gle.com, oleg@...hat.com,
        andrea@...nel.org, akpm@...ux-foundation.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] mm, oom: allow oom reaper to race with exit_mmap

Michal Hocko wrote:
> On Tue 27-06-17 22:31:58, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Tue 27-06-17 20:39:28, Tetsuo Handa wrote:
> > > > Michal Hocko wrote:
> > > > > > I wonder why you prefer timeout based approach. Your patch will after all
> > > > > > set MMF_OOM_SKIP if operations between down_write() and up_write() took
> > > > > > more than one second.
> > > > > 
> > > > > if we reach down_write then we have unmapped the address space in
> > > > > exit_mmap and oom reaper cannot do much more.
> > > > 
> > > > So, by the time down_write() is called, majority of memory is already released, isn't it?
> > > 
> > > In most cases yes. To be put it in other words. By the time exit_mmap
> > > takes down_write there is nothing more oom reaper could reclaim.
> > > 
> > Then, aren't there two exceptions which your patch cannot guarantee;
> > down_write(&mm->mmap_sem) in __ksm_exit() and __khugepaged_exit() ?
> 
> yes it cannot. Those would be quite rare situations. Somebody holding
> the mmap sem would have to block those to wait for too long (that too
> long might be for ever actually if we are livelocked). We cannot rule
> that out of course and I would argue that it would be more appropriate
> to simply go after another task in those rare cases. There is not much
> we can really do. At some point the oom reaper has to give up and move
> on otherwise we are back to square one when OOM could deadlock...
> 
> Maybe we can actually get rid of this down_write but I would go that way
> only when it proves to be a real issue.
> 
> > Since for some reason exit_mmap() cannot be brought to before
> > ksm_exit(mm)/khugepaged_exit(mm) calls,
> 
> 9ba692948008 ("ksm: fix oom deadlock") would tell you more about the
> ordering and the motivation.

I don't understand ksm nor khugepaged. But that commit was actually calling
ksm_exit() just before free_pgtables() in exit_mmap(). It is ba76149f47d8c939
("thp: khugepaged") which added /* must run before exit_mmap */ comment.

> 
> > 
> > 	ksm_exit(mm);
> > 	khugepaged_exit(mm); /* must run before exit_mmap */
> > 	exit_mmap(mm);
> > 
> > shouldn't we try __oom_reap_task_mm() before calling these down_write()
> > if mm is OOM victim's?
> 
> This is what we try. We simply try to get mmap_sem for read and do our
> work as soon as possible with the proposed patch. This is already an
> improvement, no?

We can ask the OOM reaper kernel thread try to reap before the OOM killer
releases oom_lock mutex. But that is not guaranteed. It is possible that
the OOM victim thread is executed until down_write() in __ksm_exit() or
__khugepaged_exit() and then the OOM reaper kernel thread starts calling
down_read_trylock().