[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e8b0e275-bb29-0358-c049-e54b0960c858@oracle.com>
Date: Fri, 6 Jul 2018 22:36:24 +0800
From: Ka-Cheong Poon <ka-cheong.poon@...cle.com>
To: Sowmini Varadhan <sowmini.varadhan@...cle.com>
Cc: netdev@...r.kernel.org, santosh.shilimkar@...cle.com,
davem@...emloft.net, rds-devel@....oracle.com
Subject: Re: [PATCH v2 net-next 0/3] rds: IPv6 support
On 07/06/2018 01:44 AM, Sowmini Varadhan wrote:
> - rds_getname(): one of the existing properties of PF_RDS is that a
> connect() does not involve an implicit bind(). Note that you are basing
> the changes in rds_bind() on this behavior, thus I guess we make the
> choice of treating this as a feature, not a bug.
This patch series does not change existing behavior. But
I think this is a strange RDS semantics as it differs from
other types of socket. But this is not about IPv6 support
and can be dealt with later.
> Since we are choosing to treat this behavior as a feature we
> need to be consistent in rds_getname(). If we find
> (!peer and !ipv6_addr_any(rs_conn_addr)) and the socket is not yet bound,
> the returned address (address size, address family) should be based
> on the rs_conn_addr. (Otherwise if I connect to abc::1 without doing a
> bind(), and try to do a getsockname(), I get back a sockaddr_in?!)
OK, will change that.
> - rds_cancel_sent_to() and rds_connect and rds_bind and rds_sendmsg
> As DaveM has already pointed out, we should be using sa_family to figure
> out sockaddr_in vs sockaddr_in6 (not the other way around of inspecting
> len first, and then the family- that way wont work if I pass in
> sockaddr_storage). E.g., see inet_dgram_connect.
>
> if (addr_len < sizeof(uaddr->sa_family))
> return -EINVAL;
OK, will change that.
> - In net/rds/rds.h;
>
> /* The following ports, 16385, 18634, 18635, are registered with IANA as
> * the ports to be used for RDS over TCP and UDP. 18634 is the historical
>
> What is "RDS over TCP and UDP"? There is no such thing as RDS-over-UDP.
> IN fact RDS has nothing to do with UDP. The comment is confused. See next
> item below, where the comment disappears.
As mentioned in the above comments, they are registered with IANA.
There is no RDS/UDP in Linux but the port numbers are registered
nevertheless. And RDS can work over UDP AFAICT. BTW, it is really
strange that RDS/TCP does not use the port number already registered.
Anyway, the above comments are just a note on this fact in the IANA
database. The following is a link to the IANA assignment.
https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt
> - Also in net/rds/rds.h
> Please dont define transport specific parameters like RD_CM_PORT in the
> common rds.h header. It is unfortunate that we already have RDS_PORT there,
> and we should try to clean that up as well. NOte that RDS_TCP_PORT
> is now in the correct transport-module-specific header (net/rds/tcp.h)
> and its unclean to drag it from there, into the common header as you are
> doing.
>
> In fact I just tried to move the RDS_PORT definition into
> net/rds/rdma_transport.h and it built just-fine. So to summarize,
> please do the following:
> 1. move RDS_PORT into rdma_transport.h
> 2. add RDS_CM_PORT into rdma_transport.h
> 3. stop dragging RDS_TCP_PORT from its current happy home in net/rds/tcp.h
> to net/rds/rds.h
IMHO, there should only be RDS_PORT in the first place. And it
can be used for all transports. For example, if RDS/SCTP is added,
it can also use the same port number. There is no need to define
a new port number for each transport. What is the rationale that
each transport needs to have its own number and be defined in its
own header file?
> - net/rds/connection.c
> As we have discussed offline before, the fact that we cannot report
> TCP seq# etc info via the existing rds-info API is not "a bug in the
> design of MPRDS" but rather a lacking in the API design. Moreover,
> much of the useful information around the TCP socket is already
> available via procfs, TCP_INFO etc, so the info by rds-info is rarely
> used for rds-tcp- the more useful information is around the RDS socket
> itself. So there is a bug in the comment, would be nice if you removed it.
If the behavior of a software component is modified/enhanced such
that the existing API of this component has problems dealing with
this new behavior, should the API be modified or a new API be added
to handle that? If neither is done, shouldn't this be considered
a bug?
> Also, while you are there, s/exisiting/existing, please.
OK, with change that.
> General comments:
> -----------------
> I remain unconvinced by your global <-> link-local arguments.
>
> For UDP sockets we can do this:
>
> eth0
> host1 -------------------------------------- host2
> abc::1/64 fe80::2
> add abc::/64 as onlink subnet route
>
>
> host1# traceroute6 -i eth0 -s abc::1 fe80::2
>
> You just broke this for RDS and are using polemic to defend your case,
> but the main thrust of your diatribe seems to be "why would you need
> this?" I'll try to address that briefly here.
It is unclear why you need to use words like "polemic"
and "diatribe". What I wrote is in public and folks can
judge whether they are personal in nature. I'd suggest
that discussion should be done in a professional manner.
> - There may be lot of valid reasons why host2 does not want to be
> configured with a global prefix. e.g., I only want host2 to be able
> to talk to onlink hosts.
Right. It can use link local address to do this. And
peers can also use link local address to communicate with
host2.
> - RDS mandatorily requires sockets to be bound. So the normal src addr
> selection (that would have caused host1 to use a link-local to talk
> to host2) is suppressed in this case
This requirement may be something to be fixed in future. For
example, one can do "rds-ping fe80::x%y" without specifying
the source address to use and it works fine. The trick is
that rds-ping uses an UDP socket to find the source address to
use and then bind() the RDS socket to that address. So in this
rds-ping example, it will use the correct link local address
as specified in RFC 6724 to set up the RDS connection. This
trick really should not be needed. But as I mentioned, this
patch series does not change existing behavior.
> This is exactly the same as a UDP socket bound to abc::1
>
> Note well, that one of the use-cases for RDS-TCP is to replace
> existing infra that uses UDP for cluster-IPC. This has come up before
> on netdev:
>
> See https://www.mail-archive.com/search?l=netdev@vger.kernel.org&q=subject:%22Re%5C%3A+%5C%5BPATCH+net%5C-next+0%5C%2F6%5C%5D+kcm%5C%3A+Kernel+Connection+Multiplexor+%5C%28KCM%5C%29%22&o=newest&f=1
>
> so feature parity with udp is just as important as feature-parity
> for rds_rdma.
FWIW, RDS is on top of other transport protocols. It does not
really need to provide all services provided by the underlying
protocols.
> I hope that helps you see why we need to not break this gratuituously
> for rds-tcp.
Please note that I understand what you wrote. My objection
is that it requires special set up and can lead to problems
when people actually try to do that. RDS runs on other
platforms too. This will frustrate users and generate calls.
And RDS apps do not really need that as link local address i
always available. Having said this, I will add the support
to handle it to save further discussion.
--
K. Poon
ka-cheong.poon@...cle.com
Powered by blists - more mailing lists