[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1432045826.5304.6.camel@opteya.com>
Date: Tue, 19 May 2015 16:30:26 +0200
From: Yann Droneaud <ydroneaud@...eya.com>
To: Haggai Eran <haggaie@...lanox.com>
Cc: Doug Ledford <dledford@...hat.com>, linux-rdma@...r.kernel.org,
netdev@...r.kernel.org, Liran Liss <liranl@...lanox.com>,
Guy Shapiro <guysh@...lanox.com>,
Shachar Raindel <raindel@...lanox.com>,
Yotam Kenneth <yotamke@...lanox.com>
Subject: Re: [PATCH v4 for-next 00/12] Add network namespace support in the
RDMA-CM
Hi,
Le dimanche 17 mai 2015 à 08:50 +0300, Haggai Eran a écrit :
> Thanks again everyone for the review comments. I've updated the patch
> set
> accordingly. The main changes are in the first patch to use a read
> -write
> semaphore instead of an SRCU, and with the reference counting of
> shared
> ib_cm_ids.
> Please let me know if I missed anything, or if there are other issues
> with
> the series.
>
> Regards,
> Haggai
>
> Changes from v3:
> - Patch 1 and 3: use read-write semaphore instead of an SRCU.
> - Patch 5:
> * Use a direct reference count instead of a kref.
> * Instead of adding get/put pair for ib_cm_ids, just avoid
> destroying an
> id when it is still in use.
> * Squashes these two patches together, since the first one became
> too
> short:
> IB/cm: Reference count ib_cm_ids
> IB/cm: API to retrieve existing listening CM IDs
> - Rebase to Doug's to-be-rebased/for-4.2 branch.
>
> Changes from v2:
> - Add patch 1 to change device_mutex to an RCU.
> - Remove patch that fixed IPv4 connections to an IPv4/IPv6 listener.
> - Limit namespace related changes to RDMA CM and InfiniBand only.
> - Rebase on dledford/for-v4.2, with David Ahern's unaligned access
> patch.
> * Use Michael Wang's capability functions where needed.
> - Move the struct net argument to be the first in all functions, to
> match the
> networking core scheme.
> - Patch 2:
> * Remove unwanted braces.
> - Patch 4: check the return value of ib_find_cached_pkey.
> - Patch 8: verify the address family before calling cm_save_ib_info.
> - Patch 10: use generic_net instead of a custom radix tree for having
> per
> network namespace data.
> - Minor changes.
>
> Changes from v1:
> - Include patch 1 in this series.
> - Rebase for v4.1.
>
> Changes from v0:
> - Fix code review comments by Yann
> - Rebase on top of linux-3.19
>
> RDMA-CM uses IP based addressing and routing to setup RDMA
> connections between
> hosts. Currently, all of the IP interfaces and addresses used by the
> RDMA-CM
> must reside in the init_net namespace. This restricts the usage of
> containers
> with RDMA to only work with host network namespace (aka the kernel
> init_net NS
> instance).
>
> This patchset allows using network namespaces with the RDMA-CM.
>
> Each RDMA-CM id keeps a reference to a network namespace.
>
> This reference is based on the process network namespace at the time
> of the
> creation of the object or inherited from the listener.
>
> This network namespace is used to perform all IP and network related
> operations. Specifically, the local device lookup, as well as the
> remote GID
> address resolution are done in the context of the RDMA-CM object's
> namespace.
> This allows outgoing connections to reach the right target, even if
> the same
> IP address exists in multiple network namespaces. This can happen if
> each
> network namespace resides on a different P_Key.
>
> Additionally, the network namespace is used to split the listener
> service ID
> table. From the user point of view, each network namespace has a
> unique,
> completely independent table of service IDs. This allows running
> multiple
> instances of a single service on the same machine, using containers.
> To
> implement this, multiple RDMA CM IDs, belonging to different
> namespaces may
> now share their CM ID. When a request on such a CM ID arrives, the
> RDMA CM
> module finds out the correct namespaces and looks for the RDMA CM ID
> matching the request's parameters.
>
> The functionality introduced by this series would come into play when
> the
> transport is InfiniBand and IPoIB interfaces are assigned to each
> namespace.
> Multiple IPoIB interfaces can be created and assigned to different
> RDMA-CM
> capable containers, for example using pipework [1].
>
> Full support for RoCE will be introduced in a later stage.
>
How does this play with iWarp: as iWarp HCA are aware of IP addresses /
UDP/TCP ports, AFAIK, are those tied to namespace with this patchset or
will it be possible to use the iWarp HCA to access to address/port
resources tied to a different namespace ?
> The patches apply against Doug's tree for v4.2.
>
> The patchset is structured as follows:
>
> Patch 1 adds a read-write semaphore in addition to the device mutex
> in
> ib_core to allow traversing the client list without a deadlock in
> Patch 3.
>
> Patch 2 is a relatively trivial API extension, requiring the callers
> of certain ib_addr functions to provide a network namespace, as
> needed.
>
> Patches 3 and 4 adds the ability to lookup a network namespace
> according to
> the IP address, device and P_Key. It finds the matching IPoIB
> interfaces, and
> safely takes a reference on the network namespace before returning to
> the
> caller.
>
> Patches 5-6 make necessary changes to the CM layer, to allow sharing
> of a
> single CM ID between multiple RDMA CM IDs. This includes adding a
> reference
> count to ib_cm_id structs, add an API to either create a new CM ID or
> use
> an existing one, and expose the service ID to ib_cm clients.
>
> Patches 7-8 do some preliminary refactoring to the rdma_cm module.
> Patch 7
> refactors the logic that extracts the IP address from a connect
> request to
> allow reuse by the namespace lookup code further on. Patch 8 changes
> the
> way RDMA CM module creates CM IDs, to avoid relying on the
> compare_data
> feature of ib_cm. This feature associate a single compare_data struct
> per
> ib_cm_id, so it cannot be used when sharing CM IDs.
>
> Patches 9-12 add proper namespace support to the RDMA-CM module. This
> includes adding multiple port space tables, sharing ib_cm_ids between
> rdma_cm_ids, adding a network namespace parameter, and finally
> retrieving
> the namespace from the creating process.
>
Regards.
--
Yann Droneaud
OPTEYA
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists