lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 28 Mar 2011 11:06:55 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Alan Cox <alan@...rguk.ukuu.org.uk>
Cc:	Will Newton <will.newton@...il.com>,
	Luke Kenneth Casson Leighton <luke.leighton@...il.com>,
	linux-kernel@...r.kernel.org
Subject: Re: advice sought: practicality of SMP cache coherency implemented
 in assembler (and a hardware detect line)

On Sat, Mar 26, 2011 at 12:08:47PM +0000, Alan Cox wrote:
> > Probably not. Is it a virtual or physical indexed cache? Do you have a
> > precise workload in mind? If you have a very precise workload and you
> > don't expect to get many write conflicts then it could be made to
> > work.
> 
> I'm unconvinced. The user space isn't the hard bit - little user memory
> is shared writable, the kernel data structures on the other hand,
> especially in the RCU realm are going to be interesting.

Indeed.  One approach is to flush the caches on each rcu_dereference().
Of course, this assumes that the updaters flush their caches on each
smp_wmb().  You probably also need to make ACCESS_ONCE() flush caches
(which would automatically take care of rcu_dereference()).  So might
work, but won't be fast.

You can of course expect a lot of odd bugs in taking this approach.
The assumption of cache coherence is baked pretty deeply into most
shared-memory parallel software.  As you might have heard in the 2005
discussion.  ;-)

> > There are a number of mature cores out there that can do this already
> > and can be bought off the shelf, I wouldn't underestimate the
> > difficulty of getting your cache coherency protocol right particularly
> > on a limited time/resource budget.
> 
> Architecturally you may want to look at running one kernel per device
> (remembering that you can share the non writable kernel pages between
> different instances a bit if you are careful) - and in theory certain
> remote mappings.
> 
> Basically it would become a cluster with a very very fast "page transfer"
> operation for moving data between nodes.

This works for applications coded specially for this platform, but unless
I am missing something, not for existing pthreads applications.  Might
be able to handle things like Erlang that do parallelism without shared
memory.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ