lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 23 Apr 2015 09:25:16 -0500 (CDT)
From:	Christoph Lameter <cl@...ux.com>
To:	Benjamin Herrenschmidt <benh@...nel.crashing.org>
cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Jerome Glisse <j.glisse@...il.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	jglisse@...hat.com, mgorman@...e.de, aarcange@...hat.com,
	riel@...hat.com, airlied@...hat.com,
	aneesh.kumar@...ux.vnet.ibm.com,
	Cameron Buschardt <cabuschardt@...dia.com>,
	Mark Hairgrove <mhairgrove@...dia.com>,
	Geoffrey Gerfin <ggerfin@...dia.com>,
	John McKenna <jmckenna@...dia.com>, akpm@...ux-foundation.org
Subject: Re: Interacting with coherent memory on external devices

On Thu, 23 Apr 2015, Benjamin Herrenschmidt wrote:

> They are via MMIO space. The big differences here are that via CAPI the
> memory can be fully cachable and thus have the same characteristics as
> normal memory from the processor point of view, and the device shares
> the MMU with the host.
>
> Practically what that means is that the device memory *is* just some
> normal system memory with a larger distance. The NUMA model is an
> excellent representation of it.

I sure wish you would be working on using these features to increase
performance and the speed of communication to devices.

Device memory is inherently different from main memory (otherwise the
device would be using main memory) and thus not really NUMA. NUMA at least
assumes that the basic characteristics of memory are the same while just
the access speeds vary. GPU memory has very different performance
characteristics and the various assumptions on memory that the kernel
makes for the regular processors may not hold anymore.

> For my use cases the advantage of CAPI lies in the reduction of latency
> > for coprocessor communication. I hope that CAPI will allow fast cache to
> > cache transactions between a coprocessor and the main one. This is
> > improving the ability to exchange data rapidly between a application code
> > and some piece of hardware (NIC, GPU, custom hardware etc etc)
> >
> > Fundamentally this is currently an design issue since CAPI is running on
> > top of PCI-E and PCI-E transactions establish a minimum latency that
> > cannot be avoided. So its hard to see how CAPI can improve the situation.
>
> It's on top of the lower layers of PCIe yes, I don't know the exact
> latency numbers. It does enable the device to own cache lines though and
> vice versa.

Could you come up with a way to allow faster device communication through
improving on the PCI-E cacheline handoff via CAPI? That would be something
useful that I expected from it. If the processor can transfer some word
faster into a CAPI device or get status faster then that is a valuable
thing.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ