lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090213193619.GH6854@linux.vnet.ibm.com>
Date:	Fri, 13 Feb 2009 11:36:19 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Mathieu Desnoyers <compudj@...stal.dyndns.org>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Bryan Wu <cooloney@...nel.org>, linux-kernel@...r.kernel.org,
	ltt-dev@...ts.casi.polymtl.ca,
	uclinux-dist-devel@...ckfin.uclinux.org
Subject: Re: [ltt-dev] [RFC git tree] Userspace RCU (urcu) for Linux
	(repost)

On Fri, Feb 13, 2009 at 01:54:11PM -0500, Mathieu Desnoyers wrote:
> * Linus Torvalds (torvalds@...ux-foundation.org) wrote:
> > 
> > 
> > Btw, for user space, if you want to do this all right for something like 
> > BF. I think the only _correct_ thing to do (in the sense that the end 
> > result will actually be debuggable) is to essentially give full SMP 
> > coherency in user space.
> > 
> > It's doable, but rather complicated, and I'm not 100% sure it really ends 
> > up making sense. The way to do it is to just simply say:
> > 
> >  - never map the same page writably on two different cores, and always 
> >    flush the cache (on the receiving side) when you switch a page from one 
> >    core to another.
> > 
> > Now, the kernel can't really do that reasonably, but user space possibly could.
> > 
> > Now, I realize that blackfin doesn't actually even have a MMU or a TLB, so 
> > by "mapping the same page" in that case we end up really meaning "having a 
> > shared mapping or thread". I think that _should_ be doable. The most 
> > trivial approach might be to simply limit all processes with shared 
> > mappings or CLONE_VM to core 0, and letting core 1 run everything else 
> > (but you could do it differently: mapping something with MAP_SHARED would 
> > force you to core 0, but threads would just force the thread group to 
> > stay on _one_ core, rather than necessarily a fixed one).
> > 
> > Yeah, because of the lack of real memory protection, the kernel can't 
> > _know_ that processes don't behave badly and access things that they 
> > didn't explicitly map, but I'm hoping that that is rare.
> > 
> > And yes, if you really want to use threads as a way to do something 
> > across cores, you'd be screwed - the kenrel would only schedule the 
> > threads on one CPU. But considering the undefined nature of threading on 
> > such a cpu, wouldn't that still be preferable? Wouldn't it be nice to have 
> > the knowledge that user space _looks_ cache-coherent by virtue of the 
> > kernel just limiting cores appropriately?
> > 
> > And then user space would simply not need to worry as much. Code written 
> > for another architecture will "just work" on BF SMP too. With the normal 
> > uclinux limitations, of course.
> > 
> > 			Linus
> > 
> 
> I don't know enough about BF to tell for sure, but the other way around
> I see that would still permit running threads with shared memory space
> on different CPUs is to call a cache flush each time a userspace lock is
> taken/released (at the synchronization points where the "magic
> test-and-set instruction" is used) _from_ userspace.
> 
> If some more elaborate userspace MT code uses something else than those
> basic locks provided by core libraries to synchronize data exchange,
> then it would be on its own and have to ensure cache flushing itself.

How about just doing a sched_setaffinity() in the BF case?  Sounds
like an easy way to implement Linus's suggestion of restricting the
multithreaded processes to a single core.  I have a hard time losing
sleep over the lack of parallelism in the case where the SMP support is
at best rudimentary...

> And yes, that would be incredibly costly/slow. This is why RCU-style
> reader-sides are good : they have much more relaxed synchronization
> constraints.
> 
> I am just thinking that the single-process to a single core solution you
> propose above will be somewhat limiting if we end up with a 64-cores
> non-cache-coherent architecture. They tend to be especially used for
> stuff like video decoding, which is very easy to parallelize when shared
> memory is available. But I guess we are not there yet.

If someone invests the silicon for 64 cores, but doesn't provide some
semblance of cache coherence, I have to question their sanity.  As a
kludgey quick fix to get to a dual-proc solution I can understand it,
but there is a limit!  ;-)

						Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ