netdev - Re: [PATCH v2 net-next 0/3] rds: IPv6 support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180705174444.GA24483@oracle.com>
Date:   Thu, 5 Jul 2018 13:44:44 -0400
From:   Sowmini Varadhan <sowmini.varadhan@...cle.com>
To:     Ka-Cheong Poon <ka-cheong.poon@...cle.com>
Cc:     netdev@...r.kernel.org, santosh.shilimkar@...cle.com,
        davem@...emloft.net, rds-devel@....oracle.com
Subject: Re: [PATCH v2 net-next 0/3] rds: IPv6 support


Some additional comments on this patchset (consolidated here, 
please tease this apart into patch1/patch2/patch3 as appropriate)

I looked at the most of rds-core, and the rds-tcp changes.
Please make sure santosh looks at these carefully, especially.
    - RDS bind key  changes
    - connection.c
    - all the rds_rdma related changes (e.g., the ib* and rdma* files)

- rds_getname(): one of the existing properties of PF_RDS is that a
  connect() does not involve an implicit bind(). Note that you are basing
  the changes in rds_bind() on this behavior, thus I guess we make the
  choice of treating this as a feature, not a bug.

  Since we are choosing to treat this behavior as a feature we
  need to be consistent in rds_getname(). If we find
  (!peer and !ipv6_addr_any(rs_conn_addr)) and the socket is not yet bound, 
  the returned address (address size, address family) should be based
  on the rs_conn_addr. (Otherwise if I connect to abc::1 without doing a 
  bind(), and try to do a getsockname(), I get back a sockaddr_in?!)

- rds_cancel_sent_to() and rds_connect and rds_bind and rds_sendmsg
  As DaveM has already pointed out, we should be using sa_family to figure
  out sockaddr_in vs sockaddr_in6 (not the other way around of inspecting
  len first, and then the family- that way wont work if I pass in
  sockaddr_storage).  E.g., see inet_dgram_connect.

        if (addr_len < sizeof(uaddr->sa_family))
                return -EINVAL;

- In net/rds/rds.h;

  /* The following ports, 16385, 18634, 18635, are registered with IANA as
   * the ports to be used for RDS over TCP and UDP.  18634 is the historical

  What is "RDS over TCP and UDP"? There is no such thing as RDS-over-UDP. 
  IN fact RDS has nothing to do with UDP. The comment is confused. See next
  item below, where the comment disappears.
  
- Also in net/rds/rds.h
  Please dont define transport specific parameters like RD_CM_PORT in the 
  common rds.h header. It is unfortunate that we already have RDS_PORT there,
  and we should try to clean that up as well. NOte that RDS_TCP_PORT 
  is now in the correct transport-module-specific header (net/rds/tcp.h)
  and its unclean to drag it from there, into the common header as you are 
  doing.

  In fact I just tried to move the RDS_PORT definition into 
  net/rds/rdma_transport.h and it built just-fine. So to summarize,
  please do the following:
  1. move RDS_PORT into rdma_transport.h
  2. add RDS_CM_PORT into rdma_transport.h
  3. stop dragging RDS_TCP_PORT from its current happy home in net/rds/tcp.h
     to net/rds/rds.h

- net/rds/connection.c
  As we have discussed offline before, the fact that we cannot report
  TCP seq# etc info via the existing rds-info API is not "a bug in the
  design of MPRDS" but rather a lacking in the API design. Moreover,
  much of the useful information around the TCP socket is already
  available via procfs, TCP_INFO etc, so the info by rds-info is rarely
  used for rds-tcp- the more useful information is around the RDS socket
  itself.  So there is a bug in the comment, would be nice if you removed it.

  Also, while you are there, s/exisiting/existing, please.

General comments:
-----------------
I remain unconvinced by your global <-> link-local arguments.

For UDP sockets we can do this:

         eth0
   host1 -------------------------------------- host2
         abc::1/64                       fe80::2
                                         add abc::/64 as onlink subnet route


  host1# traceroute6 -i eth0 -s abc::1 fe80::2

You just broke this for RDS and are using polemic to defend your case,
but the main thrust of your diatribe seems to be "why would you need 
this?" I'll try to address that briefly here. 

- There may be lot of valid reasons why host2 does not want to be 
  configured with a global prefix. e.g., I only want host2 to be able 
  to talk to onlink hosts.

- RDS mandatorily requires sockets to be bound. So the normal src addr
  selection (that would have caused host1 to use a link-local to talk
  to host2) is suppressed in this case 

  This is exactly the same as a UDP socket bound to abc::1

  Note well, that one of the use-cases for RDS-TCP is to replace 
  existing infra that uses UDP for cluster-IPC. This has come up before
  on netdev:

  See https://www.mail-archive.com/search?l=netdev@vger.kernel.org&q=subject:%22Re%5C%3A+%5C%5BPATCH+net%5C-next+0%5C%2F6%5C%5D+kcm%5C%3A+Kernel+Connection+Multiplexor+%5C%28KCM%5C%29%22&o=newest&f=1

  so feature parity with udp is just as important as feature-parity
  for rds_rdma. 

  I hope that helps you see why we need to not break this gratuituously
  for rds-tcp.

BTW, etiquette is to cc folks who have offered review comments on the
code. Please make sure to cc me in follow-ups to this thread.

Thank you.

--Sowmini