linux-kernel - Re: Buggy IPI and MTRR code on low memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 29 Jan 2009 09:55:38 +1030
From:	Rusty Russell <rusty@...tcorp.com.au>
To:	Steven Rostedt <rostedt@...dmis.org>, Nick Piggin <npiggin@...e.de>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Arjan van de Ven <arjan@...radead.org>
Subject: Re: Buggy IPI and MTRR code on low memory

On Thursday 29 January 2009 03:08:14 Steven Rostedt wrote:
> 
> While developing the RT git tree I came across this deadlock.
> 
> To avoid touching the memory allocator in smp_call_function_many I forced 
> the stack use case, the path that would be taken if data fails to 
> allocate.
> 
> Here's the current code in kernel/smp.c:

Interesting.  I simplified smp_call_function_ma{sk,ny}, and introduced this bug (see 54b11e6d57a10aa9d0009efd93873e17bffd5d30).

We used to wait on OOM, yes, but we didn't do them one at a time.

We could restore that quiesce code, or call a function on all online cpus using on-stack data, and have them atomic_dec a counter when they're done (I'm not sure why we didn't do this in the first place: Nick?)

> The problem is that if we use the stack, then we must wait for the 
> function to finish. But in the mtrr code, the called functions are waiting 
> for the caller to do something after the smp_call_function. Thus we 
> deadlock! This mtrr code seems to have been there for a while. At least 
> longer than the git history.

I don't see how the *ever* worked then, even with the quiesce stuff.

> The patch creates another flag called CSD_FLAG_RELEASE. If we fail
> to alloc the data and the wait bit is not set, we still use the stack
> but we also set this flag instead of the wait flag. The receiving IPI 
> will copy the data locally, and if this flag is set, it will clear it. The 
> caller, after sending the IPI, will wait on this flag to be cleared.

Doesn't this break with more than one cpus?  I think a refcnt is needed for the general case...

Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/