linux-kernel - Re: can't oom-kill zap the victim's memory?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <201509222330.JDI64510.FOLOFQStMVFJOH@I-love.SAKURA.ne.jp>
Date:	Tue, 22 Sep 2015 23:30:06 +0900
From:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:	oleg@...hat.com
Cc:	mhocko@...nel.org, torvalds@...ux-foundation.org,
	kwalker@...hat.com, cl@...ux.com, akpm@...ux-foundation.org,
	rientjes@...gle.com, hannes@...xchg.org, vdavydov@...allels.com,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	skozina@...hat.com
Subject: Re: can't oom-kill zap the victim's memory?

Oleg Nesterov wrote:
> On 09/22, Tetsuo Handa wrote:
> >
> > I imagined a dedicated kernel thread doing something like shown below.
> > (I don't know about mm->mmap management.)
> > mm->mmap_zapped corresponds to MMF_MEMDIE.
> 
> No, it doesn't, please see below.
> 
> > bool has_sigkill_task;
> > wait_queue_head_t kick_mm_zapper;
> 
> OK, if this kthread is kicked by oom this makes more sense, but still
> doesn't look right at least initially.

Yes, I meant this kthread is kicked upon sending SIGKILL. But I forgot that

> 
> Let me repeat, I do think we need MMF_MEMDIE or something like it before
> we do something more clever. And in fact I think this flag makes sense
> regardless.
> 
> > static void mm_zapper(void *unused)
> > {
> > 	struct task_struct *g, *p;
> > 	struct mm_struct *mm;
> >
> > sleep:
> > 	wait_event(kick_remover, has_sigkill_task);
> > 	has_sigkill_task = false;
> > restart:
> > 	rcu_read_lock();
> > 	for_each_process_thread(g, p) {
> > 		if (likely(!fatal_signal_pending(p)))
> > 			continue;
> > 		task_lock(p);
> > 		mm = p->mm;
> > 		if (mm && mm->mmap && !mm->mmap_zapped && down_read_trylock(&mm->mmap_sem)) {
>                                        ^^^^^^^^^^^^^^^
> 
> We do not want mm->mmap_zapped, it can't work. We need mm->needs_zap
> set by oom_kill_process() and cleared after zap_page_range().
> 
> Because otherwise we can not handle CLONE_VM correctly. Suppose that
> an innocent process P does vfork() and the child is killed but not
> exited yet. mm_zapper() can find the child, do zap_page_range(), and
> surprise its alive parent P which uses the same ->mm.

kill(P's-child, SIGKILL) does not kill P sharing the same ->mm.
Thus, mm_zapper() can be used for only OOM-kill case and
test_tsk_thread_flag(p, TIF_MEMDIE) should be used than
fatal_signal_pending(p).

> 
> And if we rely on MMF_MEMDIE or mm->needs_zap or whaveter then
> for_each_process_thread() doesn't really make sense. And if we have
> a single MMF_MEMDIE process (likely case) then the unconditional
> _trylock is suboptimal.

I guess the more likely case is that the OOM victim successfully exits
before mm_zapper() finds it.

I thought that a dedicated kernel thread which scans the task list can do
deferred zapping by automatically retrying (in a few seconds interval ?)
when down_read_trylock() failed. 

> 
> Tetsuo, can't we do something simple which "obviously can't hurt at
> least" and then discuss the potential improvements?

No problem. I can wait for your version.

> 
> And yes, yes, the "Kill all user processes sharing victim->mm" logic
> in oom_kill_process() doesn't 100% look right, at least wrt the change
> we discuss.

If we use test_tsk_thread_flag(p, TIF_MEMDIE), we will need to set
TIF_MEMDIE to the victim after sending SIGKILL to all processes sharing
the victim's mm. Well, the likely case that the OOM victim exits before
mm_zapper() finds it becomes not-so-likely case? Then, MMF_MEMDIE is
better than test_tsk_thread_flag(p, TIF_MEMDIE)...

> 
> Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/