lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1003311342410.25284@chino.kir.corp.google.com>
Date:	Wed, 31 Mar 2010 14:07:15 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Oleg Nesterov <oleg@...hat.com>
cc:	Andrew Morton <akpm@...ux-foundation.org>,
	anfei <anfei.zhou@...il.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	nishimura@....nes.nec.co.jp,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Mel Gorman <mel@....ul.ie>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [patch] oom: give current access to memory reserves if it has
 been killed

On Wed, 31 Mar 2010, Oleg Nesterov wrote:

> On 03/30, David Rientjes wrote:
> >
> > On Tue, 30 Mar 2010, Oleg Nesterov wrote:
> >
> > > Note that __oom_kill_task() does force_sig(SIGKILL) which assumes that
> > > ->sighand != NULL. This is not true if out_of_memory() is called after
> > > current has already passed exit_notify().
> >
> > We have an even bigger problem if current is in the oom killer at
> > exit_notify() since it has already detached its ->mm in exit_mm() :)
> 
> Can't understand... I thought that in theory even kmalloc(1) can trigger
> oom.
> 

__oom_kill_task() cannot be called on a task without an ->mm.

> > > IOW, unless I missed something, it is very easy to hide the process
> > > from oom-kill:
> > >
> > > 	int main()
> > > 	{
> > > 		pthread_create(memory_hog_func);
> > > 		syscall(__NR_exit);
> > > 	}
> > >
> >
> > The check for !p->mm was moved in the -mm tree (and the oom killer was
> > entirely rewritten in that tree, so I encourage you to work off of it
> > instead
> 
> OK, but I guess this !p->mm check is still wrong for the same reason.
> In fact I do not understand why it is needed in select_bad_process()
> right before oom_badness() which checks ->mm too (and this check is
> equally wrong).
> 

It prevents kthreads from being killed.  We already identify tasks that 
are in the exit path with PF_EXITING in select_bad_process() and chosen to 
make the oom killer a no-op when it's not current so it can exit and free 
its memory.  If it is current, then we're ooming in the exit path and we 
need to oom kill it so that it gets access to memory reserves so its no 
longer blocking.

> > so if the oom killer finds an already exiting task,
> > it will become a no-op since it should eventually free memory and avoids a
> > needless oom kill.
> 
> No, afaics, And this reminds that I already complained about this
> PF_EXITING check.
> 
> Once again, p is the group leader. It can be dead (no ->mm, PF_EXITING
> is set) but it can have sub-threads. This means, unless I missed something,
> any user can trivially disable select_bad_process() forever.
> 

The task is in the process of exiting and will do so if its not current, 
otherwise it will get access to memory reserves since we're obviously oom 
in the exit path.  Thus, we'll be freeing that memory soon or recalling 
the oom killer to kill additional tasks once those children have been 
reparented (or one of its children was sacrificed).

> 
> Well. Looks like, -mm has a lot of changes in oom_kill.c. Perhaps it
> would be better to fix these mt bugs first...
> 
> Say, oom_forkbomb_penalty() does list_for_each_entry(tsk->children).
> Again, this is not right even if we forget about !child->mm check.
> This list_for_each_entry() can only see the processes forked by the
> main thread.
>

That's the intention.
 
> Likewise, oom_kill_process()->list_for_each_entry() is not right too.
> 

Why?

> Hmm. Why oom_forkbomb_penalty() does thread_group_cputime() under
> task_lock() ? It seems, ->alloc_lock() is only needed for get_mm_rss().
> 

Right, but we need to ensure that the check for !child->mm || child->mm == 
tsk->mm fails before adding in get_mm_rss(child->mm).  It can race and 
detach its mm prior to the dereference.  It would be possible to move the 
thread_group_cputime() out of this critical section, but I felt it was 
better to do filter all tasks with child->mm == tsk->mm first before 
unnecessarily finding the cputime for them.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ