linux-kernel - Re: advice sought: practicality of SMP cache coherency implemented in assembler (and a hardware detect line)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110328180655.GI2287@linux.vnet.ibm.com>
Date:	Mon, 28 Mar 2011 11:06:55 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Alan Cox <alan@...rguk.ukuu.org.uk>
Cc:	Will Newton <will.newton@...il.com>,
	Luke Kenneth Casson Leighton <luke.leighton@...il.com>,
	linux-kernel@...r.kernel.org
Subject: Re: advice sought: practicality of SMP cache coherency implemented
 in assembler (and a hardware detect line)

On Sat, Mar 26, 2011 at 12:08:47PM +0000, Alan Cox wrote:
> > Probably not. Is it a virtual or physical indexed cache? Do you have a
> > precise workload in mind? If you have a very precise workload and you
> > don't expect to get many write conflicts then it could be made to
> > work.
> 
> I'm unconvinced. The user space isn't the hard bit - little user memory
> is shared writable, the kernel data structures on the other hand,
> especially in the RCU realm are going to be interesting.

Indeed.  One approach is to flush the caches on each rcu_dereference().
Of course, this assumes that the updaters flush their caches on each
smp_wmb().  You probably also need to make ACCESS_ONCE() flush caches
(which would automatically take care of rcu_dereference()).  So might
work, but won't be fast.

You can of course expect a lot of odd bugs in taking this approach.
The assumption of cache coherence is baked pretty deeply into most
shared-memory parallel software.  As you might have heard in the 2005
discussion.  ;-)

> > There are a number of mature cores out there that can do this already
> > and can be bought off the shelf, I wouldn't underestimate the
> > difficulty of getting your cache coherency protocol right particularly
> > on a limited time/resource budget.
> 
> Architecturally you may want to look at running one kernel per device
> (remembering that you can share the non writable kernel pages between
> different instances a bit if you are careful) - and in theory certain
> remote mappings.
> 
> Basically it would become a cluster with a very very fast "page transfer"
> operation for moving data between nodes.

This works for applications coded specially for this platform, but unless
I am missing something, not for existing pthreads applications.  Might
be able to handle things like Erlang that do parallelism without shared
memory.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/