[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100916233045.73aecc26@lilo>
Date: Thu, 16 Sep 2010 23:30:45 +0930
From: Christopher Yeoh <cyeoh@....ibm.com>
To: Brice Goglin <Brice.Goglin@...ia.fr>
Cc: linux-kernel@...r.kernel.org,
Linux Memory Management List <linux-mm@...ck.org>
Subject: Re: [RFC][PATCH] Cross Memory Attach
On Thu, 16 Sep 2010 11:15:10 +0200
Brice Goglin <Brice.Goglin@...ia.fr> wrote:
> Le 16/09/2010 08:32, Brice Goglin a écrit :
> > I am the guy doing KNEM so I can comment on this. The I/OAT part of
> > KNEM was mostly a research topic, it's mostly useless on current
> > machines since the memcpy performance is much larger than I/OAT DMA
> > Engine. We also have an offload model with a kernel thread, but it
> > wasn't used a lot so far. These features can be ignored for the
> > current discussion.
>
> I've just created a knem branch where I removed all the above, and
> some other stuff that are not necessary for normal users. So it just
> contains the region management code and two commands to copy between
> regions or between a region and some local iovecs.
When I did the original hpcc runs for CMA vs shared mem double copy I
also did some KNEM runs as a bit of a sanity check. The CMA OpenMPI
implementation actually uses the infrastructure KNEM put into the
OpenMPI shared mem btl - thanks for that btw it made things much easier
for me to test CMA.
Interestingly although KNEM and CMA fundamentally are doing very
similar things, at least with hpcc I didn't see as much of a gain with
KNEM as with CMA:
MB/s
Naturally Ordered 4 8 16 32
Base 1235 935 622 419
CMA 4741 3769 1977 703
KNEM 3362 3091 1857 681
MB/s
Randomly Ordered 4 8 16 32
Base 1227 947 638 412
CMA 4666 3682 1978 710
KNEM 3348 3050 1883 684
MB/s
Max Ping Pong 4 8 16 32
Base 2028 1938 1928 1882
CMA 7424 7510 7598 7708
KNEM 5661 5476 6050 6290
I don't know the reason behind the difference - if its something
perculiar to hpcc, or if there's extra overhead the way that
knem does setup for copying, or if knem wasn't configured
optimally. I haven't done any comparison IMB or NPB runs...
syscall and setup overhead does have some measurable effect - although I
don't have the numbers for it here, neither KNEM nor CMA does quite as
well with hpcc when compared against a hacked version of hpcc where
everything is declared ahead of time as shared memory so the receiver
can just do a single copy from userspace - which I think is
representative of a theoretical maximum gain from the single copy
approach.
Chris
--
cyeoh@...ibm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists