[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <adair6zjvq8.fsf@cisco.com>
Date: Tue, 28 Aug 2007 12:38:07 -0700
From: Roland Dreier <rdreier@...co.com>
To: David Miller <davem@...emloft.net>
Cc: tom@...ngridcomputing.com, jeff@...zik.org,
swise@...ngridcomputing.com, mshefty@...ips.intel.com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
general@...ts.openfabrics.org
Subject: Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Sorry for the long latency, I was at the beach all last week.
> > And direct data placement really does give you a factor of two at
> > least, because otherwise you're stuck receiving the data in one
> > buffer, looking at some of the data at least, and then figuring out
> > where to copy it. And memory bandwidth is if anything becoming more
> > valuable; maybe LRO + header splitting + page remapping tricks can get
> > you somewhere but as NCPUS grows then it seems the TLB shootdown cost
> > of page flipping is only going to get worse.
> As Herbert has said already, people can code for this just like
> they have to code for RDMA.
No argument, you need to change the interface to take advantage of RDMA.
> There is no fundamental difference from converting an application
> to sendfile or similar.
Yes, on the transmit side, there's not much difference from sendfile
or splice, although RDMA may give a slightly nicer interface that also
gives basically the equivalent of AIO.
> The only thing this needs is a
> "recvmsg_I_dont_care_where_the_data_is()" call. There are no alignment
> issues unless you are trying to push this data directly into the
> page cache.
I don't understand how this gives you the same thing as direct data
placement (DDP). There are many situations where the sender knows
where the data has to go and if there's some way to pass that to the
receiver, so that info can be used in the receive path to put the data
in the right place, the receiver can save a copy. This is
fundamentally the same "offload" that an FC HBA does -- the SCSI
midlayer queues up commands like "read block A and put the data at
address X" and "read block B and put the data at address Y" and the
HBA matches tags on incoming data to put the blocks at the right
addresses, even if block B is received before block A.
RFC 4297 has some discussion of the various approaches, and while you
might not agree with their conclusions, it is interesting reading.
> Couple this with a card that makes sure that on a per-page basis, only
> data for a particular flow (or group of flows) will accumulate.
It seems that the NIC would also have to look into a TCP stream (and
handle out of order segments etc) to find message boundaries for this
to be equivalent to what an RDMA NIC does.
- R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists