[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180705174444.GA24483@oracle.com>
Date: Thu, 5 Jul 2018 13:44:44 -0400
From: Sowmini Varadhan <sowmini.varadhan@...cle.com>
To: Ka-Cheong Poon <ka-cheong.poon@...cle.com>
Cc: netdev@...r.kernel.org, santosh.shilimkar@...cle.com,
davem@...emloft.net, rds-devel@....oracle.com
Subject: Re: [PATCH v2 net-next 0/3] rds: IPv6 support
Some additional comments on this patchset (consolidated here,
please tease this apart into patch1/patch2/patch3 as appropriate)
I looked at the most of rds-core, and the rds-tcp changes.
Please make sure santosh looks at these carefully, especially.
- RDS bind key changes
- connection.c
- all the rds_rdma related changes (e.g., the ib* and rdma* files)
- rds_getname(): one of the existing properties of PF_RDS is that a
connect() does not involve an implicit bind(). Note that you are basing
the changes in rds_bind() on this behavior, thus I guess we make the
choice of treating this as a feature, not a bug.
Since we are choosing to treat this behavior as a feature we
need to be consistent in rds_getname(). If we find
(!peer and !ipv6_addr_any(rs_conn_addr)) and the socket is not yet bound,
the returned address (address size, address family) should be based
on the rs_conn_addr. (Otherwise if I connect to abc::1 without doing a
bind(), and try to do a getsockname(), I get back a sockaddr_in?!)
- rds_cancel_sent_to() and rds_connect and rds_bind and rds_sendmsg
As DaveM has already pointed out, we should be using sa_family to figure
out sockaddr_in vs sockaddr_in6 (not the other way around of inspecting
len first, and then the family- that way wont work if I pass in
sockaddr_storage). E.g., see inet_dgram_connect.
if (addr_len < sizeof(uaddr->sa_family))
return -EINVAL;
- In net/rds/rds.h;
/* The following ports, 16385, 18634, 18635, are registered with IANA as
* the ports to be used for RDS over TCP and UDP. 18634 is the historical
What is "RDS over TCP and UDP"? There is no such thing as RDS-over-UDP.
IN fact RDS has nothing to do with UDP. The comment is confused. See next
item below, where the comment disappears.
- Also in net/rds/rds.h
Please dont define transport specific parameters like RD_CM_PORT in the
common rds.h header. It is unfortunate that we already have RDS_PORT there,
and we should try to clean that up as well. NOte that RDS_TCP_PORT
is now in the correct transport-module-specific header (net/rds/tcp.h)
and its unclean to drag it from there, into the common header as you are
doing.
In fact I just tried to move the RDS_PORT definition into
net/rds/rdma_transport.h and it built just-fine. So to summarize,
please do the following:
1. move RDS_PORT into rdma_transport.h
2. add RDS_CM_PORT into rdma_transport.h
3. stop dragging RDS_TCP_PORT from its current happy home in net/rds/tcp.h
to net/rds/rds.h
- net/rds/connection.c
As we have discussed offline before, the fact that we cannot report
TCP seq# etc info via the existing rds-info API is not "a bug in the
design of MPRDS" but rather a lacking in the API design. Moreover,
much of the useful information around the TCP socket is already
available via procfs, TCP_INFO etc, so the info by rds-info is rarely
used for rds-tcp- the more useful information is around the RDS socket
itself. So there is a bug in the comment, would be nice if you removed it.
Also, while you are there, s/exisiting/existing, please.
General comments:
-----------------
I remain unconvinced by your global <-> link-local arguments.
For UDP sockets we can do this:
eth0
host1 -------------------------------------- host2
abc::1/64 fe80::2
add abc::/64 as onlink subnet route
host1# traceroute6 -i eth0 -s abc::1 fe80::2
You just broke this for RDS and are using polemic to defend your case,
but the main thrust of your diatribe seems to be "why would you need
this?" I'll try to address that briefly here.
- There may be lot of valid reasons why host2 does not want to be
configured with a global prefix. e.g., I only want host2 to be able
to talk to onlink hosts.
- RDS mandatorily requires sockets to be bound. So the normal src addr
selection (that would have caused host1 to use a link-local to talk
to host2) is suppressed in this case
This is exactly the same as a UDP socket bound to abc::1
Note well, that one of the use-cases for RDS-TCP is to replace
existing infra that uses UDP for cluster-IPC. This has come up before
on netdev:
See https://www.mail-archive.com/search?l=netdev@vger.kernel.org&q=subject:%22Re%5C%3A+%5C%5BPATCH+net%5C-next+0%5C%2F6%5C%5D+kcm%5C%3A+Kernel+Connection+Multiplexor+%5C%28KCM%5C%29%22&o=newest&f=1
so feature parity with udp is just as important as feature-parity
for rds_rdma.
I hope that helps you see why we need to not break this gratuituously
for rds-tcp.
BTW, etiquette is to cc folks who have offered review comments on the
code. Please make sure to cc me in follow-ups to this thread.
Thank you.
--Sowmini
Powered by blists - more mailing lists