[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1429756200.4915.19.camel@kernel.crashing.org>
Date: Thu, 23 Apr 2015 12:30:00 +1000
From: Benjamin Herrenschmidt <benh@...nel.crashing.org>
To: Christoph Lameter <cl@...ux.com>
Cc: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Jerome Glisse <j.glisse@...il.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
jglisse@...hat.com, mgorman@...e.de, aarcange@...hat.com,
riel@...hat.com, airlied@...hat.com,
aneesh.kumar@...ux.vnet.ibm.com,
Cameron Buschardt <cabuschardt@...dia.com>,
Mark Hairgrove <mhairgrove@...dia.com>,
Geoffrey Gerfin <ggerfin@...dia.com>,
John McKenna <jmckenna@...dia.com>, akpm@...ux-foundation.org
Subject: Re: Interacting with coherent memory on external devices
On Wed, 2015-04-22 at 11:16 -0500, Christoph Lameter wrote:
> On Wed, 22 Apr 2015, Paul E. McKenney wrote:
>
> > I completely agree that some critically important use cases, such as
> > yours, will absolutely require that the application explicitly choose
> > memory placement and have the memory stay there.
>
>
>
> Most of what you are trying to do here is already there and has been done.
> GPU memory is accessible. NICs work etc etc. All without CAPI. What
> exactly are the benefits of CAPI? Is driver simplification? Reduction of
> overhead? If so then the measures proposed are a bit radical and
> may result in just the opposite.
They are via MMIO space. The big differences here are that via CAPI the
memory can be fully cachable and thus have the same characteristics as
normal memory from the processor point of view, and the device shares
the MMU with the host.
Practically what that means is that the device memory *is* just some
normal system memory with a larger distance. The NUMA model is an
excellent representation of it.
> For my use cases the advantage of CAPI lies in the reduction of latency
> for coprocessor communication. I hope that CAPI will allow fast cache to
> cache transactions between a coprocessor and the main one. This is
> improving the ability to exchange data rapidly between a application code
> and some piece of hardware (NIC, GPU, custom hardware etc etc)
>
> Fundamentally this is currently an design issue since CAPI is running on
> top of PCI-E and PCI-E transactions establish a minimum latency that
> cannot be avoided. So its hard to see how CAPI can improve the situation.
It's on top of the lower layers of PCIe yes, I don't know the exact
latency numbers. It does enable the device to own cache lines though and
vice versa.
> The new thing about CAPI are the cache to cache transactions and
> participation in cache coherency at the cacheline level. That is a
> different approach than the device memory oriented PCI transcactions.
> Perhaps even CAPI over PCI-E can improve the situation there (maybe the
> transactions are lower latency than going to device memory) and hopefully
> CAPI will not forever be bound to PCI-E and thus at some point shake off
> the shackles of a bus designed by a competitor.
Ben.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists