lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1204091355280.10873@chino.kir.corp.google.com>
Date:	Mon, 9 Apr 2012 14:22:37 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
cc:	Andrew Morton <akpm@...ux-foundation.org>,
	werner <w.landgraf@...ru>, Rik van Riel <riel@...hat.com>,
	Hugh Dickins <hughd@...gle.com>, linux-kernel@...r.kernel.org,
	Oleg Nesterov <oleg@...hat.com>,
	Rabin Vincent <rabin.vincent@...ricsson.com>,
	Christian Bejram <christian.bejram@...ricsson.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Anton Vorontsov <anton.vorontsov@...aro.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	stable@...r.kernel.org
Subject: Re: v3.4-rc2 out-of-memory problems (was Re: 3.4-rc1
 sticks-and-crashs)

On Mon, 9 Apr 2012, Linus Torvalds wrote:

> The real bug is actually that those notifiers are a f*cking joke, and
> the return value from the notifier is a mistake.
> 
> So I personally think that the real problem is this code in
> profile_handoff_task:
> 
>         return (ret == NOTIFY_OK) ? 1 : 0;
> 
> and ask yourself two questions:
> 
>  - what the hell does NOTIFY_OK/NOTIFY_DONE mean?
>  - what happens if there are multiple notifiers that all (or some)
> return NOTIFY_OK?
> 

NOTIFY_OK should never be a valid response for this notifier the way it's 
currently implemented, it should be NOTIFY_STOP to stop iterating the call 
chain to avoid a double free.  Right now it doesn't matter because only 
oprofile is actually freeing the task_struct and lowmemorykiller should be 
using NOTIFY_DONE.

Then we have a completeness issue if multiple callbacks want to return 
NOTIFY_STOP and an ordering issue if the oprofile callback is invoked 
before lowmemorykiller.

> I'll tell you what my answers are:
> 
>  (a) NOTIFY_DONE is the "ok, everything is fine, you can free the
> task-struct". It's also what that handoff notifier thing returns if
> there are no notifiers registered at all.
> 
>      So the fix to the Android lowmemorykiller is as simple as just
> changing NOTIFY_OK to NOTIFY_DONE, which will mean that the caller
> will properly free the task struct.
> 

I don't think so for Werner's config who also has CONFIG_OPROFILE=y, so 
oprofile would return NOTIFY_OK and queue the task_struct for free, then 
the second notifier callback to the lowmemorykiller would return 
NOTIFY_DONE which would result in put_task_struct() doing free_task() 
itself for a double free.

>      The NOTIFY_OK/NOTIFY_DONE difference really does seem to be just
> "NOTIFY_OK means that I will free the task myself later". That's what
> the oprofile uses, and it frees the task.
> 
>  (b) But the whole interface is a total f*cking mess. If *multiple*
> people return NOTIFY_OK, they're royally fucked. And the whole (and
> only) point of notifiers is that you can register multiple different
> ones independently.
> 
> So quite frankly, the *real* bug is not in that android driver
> (although I'd say that we should just make it return NOTIFY_DONE and
> be done with it). The real bug is that the whole f*cking notifier is a
> mistake, and checking the error return was the biggest mistake of all.
> 

Right, we can't handoff the freeing of the task_struct to more than one 
notifier.  It seems misdesigned from the beginning and what we really want 
is to hijack task->usage for __put_task_struct(task) if we have such a 
notifier callchain and require each one (currently just oprofile) to take 
a reference on task->usage for NOTIFY_OK and then be responsible for 
dropping the reference when it's done with it later instead of requiring 
it to free the task_struct itself.

That's _if_ we want to continue to have such an interface in the first 
place where it's only really necessary right now for oprofile (and, hence, 
wasn't implemented in an extendable way).  I'm thinking the 
lowmemorykiller, as I eluded to, could be written in a way where we can 
detect if a thread we've already killed has exited yet before killing 
another one.  We can't just store a pointer to the task_struct of the 
killed task since it could be reused for a fork later, but we could use 
TIF_MEMDIE like the oom killer does.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ