[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6a7b0951-d950-1555-a5b2-622d6f7b9f8f@oracle.com>
Date: Fri, 6 Jul 2018 17:08:40 +0800
From: Ka-Cheong Poon <ka-cheong.poon@...cle.com>
To: Santosh Shilimkar <santosh.shilimkar@...cle.com>,
netdev@...r.kernel.org
Cc: davem@...emloft.net, rds-devel@....oracle.com
Subject: Re: [PATCH v2 net-next 1/3] rds: Changing IP address internal
representation to struct in6_addr
On 07/06/2018 01:58 AM, Santosh Shilimkar wrote:
>> --- a/net/rds/bind.c
>> +++ b/net/rds/bind.c
>> @@ -42,42 +43,58 @@
>> static const struct rhashtable_params ht_parms = {
>> .nelem_hint = 768,
>> - .key_len = sizeof(u64),
>> + .key_len = RDS_BOUND_KEY_LEN,
> Do we really need the scope id to be part of the key ? With link
> local/global, do you see any collisions. Please educate me
> on the actual usecase. This can avoid bunch of changes and hence
> the question.
Yes, because link local address is not unique. The
same address can be in two different links. The scope
ID is used to differentiate that.
>
>> @@ -114,7 +132,7 @@ static int rds_add_bound(struct rds_sock *rs,
>> __be32 addr, __be16 *port)
>> rs, &addr, (int)ntohs(*port));
>> break;
>> } else {
>> - rs->rs_bound_addr = 0;
>> + rs->rs_bound_addr = in6addr_any;
> Can you elaborate why 0 is not ok ?
The type of rs_bound_addr is struct in6_addr. So 0
cannot be used.
>> int rds_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
>> {
>> struct sock *sk = sock->sk;
>> - struct sockaddr_in *sin = (struct sockaddr_in *)uaddr;
>> struct rds_sock *rs = rds_sk_to_rs(sk);
>> + struct in6_addr v6addr, *binding_addr;
>> struct rds_transport *trans;
>> + __u32 scope_id = 0;
>> int ret = 0;
>> + __be16 port;
>> + /* We only allow an RDS socket to be bound to and IPv4 address. IPv6
> s/'bound to and IPv4'/'bound to an IPv4'
Changed. But this comment will be removed in patch #2
anyway.
>> + * address support will be added later.
>> + */
>> + if (addr_len == sizeof(struct sockaddr_in)) {
>> + struct sockaddr_in *sin = (struct sockaddr_in *)uaddr;
>> +
>> + if (sin->sin_family != AF_INET ||
>> + sin->sin_addr.s_addr == htonl(INADDR_ANY))
>> + return -EINVAL;
>> + ipv6_addr_set_v4mapped(sin->sin_addr.s_addr, &v6addr);
>> + binding_addr = &v6addr;
>> + port = sin->sin_port;
>> + } else if (addr_len == sizeof(struct sockaddr_in6)) {
>> + return -EPROTONOSUPPORT;
>> + } else {
>> + return -EINVAL;
>> + }
>> lock_sock(sk);
>> - if (addr_len != sizeof(struct sockaddr_in) ||
>> - sin->sin_family != AF_INET ||
>> - rs->rs_bound_addr ||
>> - sin->sin_addr.s_addr == htonl(INADDR_ANY)) {
>> + /* RDS socket does not allow re-binding. */
>> + if (!ipv6_addr_any(&rs->rs_bound_addr)) {
>> ret = -EINVAL;
>> goto out;
>> }
> Seems new behavior you are adding. The statement itself is
> true but if we return silently for already bind address, its
> ok. May be am missing something above.
This is the existing behavior. The old code checks if
rs_bound_addr is non-zero or not. If it is non-zero,
it returns an error.
>> diff --git a/net/rds/connection.c b/net/rds/connection.c
>> index abef75d..ca72563 100644
>> --- a/net/rds/connection.c
>> +++ b/net/rds/connection.c
>
>> @@ -142,9 +151,12 @@ static void __rds_conn_path_init(struct
>> rds_connection *conn,
>> * are torn down as the module is removed, if ever.
>> */
>> static struct rds_connection *__rds_conn_create(struct net *net,
>> - __be32 laddr, __be32 faddr,
>> - struct rds_transport *trans, gfp_t gfp,
>> - int is_outgoing)
>> + const struct in6_addr *laddr,
>> + const struct in6_addr *faddr,
>> + struct rds_transport *trans,
>> + gfp_t gfp,
>> + int is_outgoing,
>> + int dev_if)
> Am just wondering if we can handle local link address case differently
> instead of pushing the interface index everywhere. Did you think about
> any alternative. This can also avoid bunch of changes again and hence
> the question. Am trying to see if we can minimize the changes to
> absolute must have to support IPv6.
If link local address is supported, scoped ID must be used.
Unless we remove the support of link local address, we need
to keep scope ID.
>> diff --git a/net/rds/ib.h b/net/rds/ib.h
>> index a6f4d7d..12f96b3 100644
>> --- a/net/rds/ib.h
>> +++ b/net/rds/ib.h
>> +union rds_ib_conn_priv {
>> + struct rds_ib_connect_private ricp_v4;
>> + struct rds6_ib_connect_private ricp_v6;
>> };
> This change was invetiable. Just add a comment saying the priviate
> data size for v6 vs v4 is different. Also for IPv6 priviate data,
> I suggest add some resrve fields for any future extensions so
> that things can be added without breaking wire protocol. With IPv4,
> we ended up in bit of mess.
Will add some comments. Unfortunately the IB private data
exchange has only a limited space. I don't think there is
any more space left for reserved field. I think we should
think of a different way to handle extensions in future.
>> --- a/net/rds/ib_cm.c
>> +++ b/net/rds/ib_cm.c
>> @@ -1,5 +1,5 @@
>> /*
>> - * Copyright (c) 2006 Oracle. All rights reserved.
>> + * Copyright (c) 2006, 2018 Oracle and/or its affiliates. All rights
>> reserved.
>> *
>> * This software is available to you under a choice of one of two
>> * licenses. You may choose to be licensed under the terms of the GNU
>> @@ -35,10 +35,12 @@
>> #include <linux/slab.h>
>> #include <linux/vmalloc.h>
>> #include <linux/ratelimit.h>
>> +#include <net/addrconf.h>
>> #include "rds_single_path.h"
>> #include "rds.h"
>> #include "ib.h"
>> +#include "tcp.h"
>>
> Hmm. Why do you need to include tcp header in ib transport
> code ? If there is any common function just move to core
> common file and use it.
I think it can be removed as it is left over from earlier
changes when the IB IPv6 listener port was RDS_TCP_PORT.
Now all the port definitions are in rds.h.
>> diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
>> index fc59821..aef73e7 100644
>> --- a/net/rds/rdma_transport.c
>> +++ b/net/rds/rdma_transport.c
>
>> +/* Initialize the RDS RDMA listeners. We create two listeners for
>> + * compatibility reason. The one on RDS_PORT is used for IPv4
>> + * requests only. The one on RDS_TCP_PORT is used for IPv6 requests
>> + * only. So only IPv6 enabled RDS module will communicate using this
>> + * port.
>> + */
> You did above ti facilitate both v4 and v6 connections to co-exist
> together ? Even though potentially there is no practical usecase,
> its nice to have this possibility.
Yes, as there is no way to differentiate the IB private
data exchange. So another port is used for IPv6 IB private
exchange. Doing this allow both IPv4 and IPv6 connections
to coexist.
>> diff --git a/net/rds/send.c b/net/rds/send.c
>> index 94c7f74..6ed2e92 100644
>> --- a/net/rds/send.c
>> +++ b/net/rds/send.c
>> @@ -1081,27 +1085,59 @@ int rds_sendmsg(struct socket *sock, struct
>> msghdr *msg, size_t payload_len)
>> goto out;
>> }
>> - if (msg->msg_namelen) {
>> - /* XXX fail non-unicast destination IPs? */
>> - if (msg->msg_namelen < sizeof(*usin) || usin->sin_family !=
>> AF_INET) {
>> + namelen = msg->msg_namelen;
>> + if (namelen != 0) {
>> + if (namelen < sizeof(*usin)) {
>> + ret = -EINVAL;
>> + goto out;
>> + }
>> + switch (namelen) {
>> + case sizeof(*usin):
>> + if (usin->sin_family != AF_INET ||
>> + usin->sin_addr.s_addr == htonl(INADDR_ANY) ||
>> + usin->sin_addr.s_addr == htonl(INADDR_BROADCAST) ||
>> + IN_MULTICAST(ntohl(usin->sin_addr.s_addr))) {
>> + ret = -EINVAL;
> The idea was to fail non-unicast IP someday but can you not add this
> change as part of v6 series. We can look at it later unless you need
> this change for v6. Again the question is mainly to add only necessary
> check and leave the existing behavior as is so please re-check all below
> checks too.
This is to match the same IPv6 check. In IPv6 code, it checks
if the address is a unicast address. I can remove the IPv4
checks but if an app does send to a multicast address, the
connection will be stuck forever.
>> diff --git a/net/rds/transport.c b/net/rds/transport.c
>> index 0b188dd..c9788db 100644
>> --- a/net/rds/transport.c
>> +++ b/net/rds/transport.c
>
>> + if (ipv6_addr_v4mapped(addr)) {
>
> Dave already commented on ipv6_addr_v4mapped(). Apart from
> some of those comments questions, rest of the patch looks
> really good and nicely done. Will also look at your
> subsequent two patches and try to send you comments in next
> few days.
Do you mean using mapped address to create IPv4 connections?
I already have it in my work space. Will be in v3.
--
K. Poon
ka-cheong.poon@...cle.com
Powered by blists - more mailing lists