[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9076abad-01f6-4ff4-a176-c2f4a85eb3fc@infradead.org>
Date: Wed, 29 May 2024 09:59:17 -0700
From: Randy Dunlap <rdunlap@...radead.org>
To: "Matthieu Baerts (NGI0)" <matttbe@...nel.org>, mptcp@...ts.linux.dev,
Mat Martineau <martineau@...nel.org>, Geliang Tang <geliang@...nel.org>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Jonathan Corbet <corbet@....net>, Gregory Detal <gregory.detal@...il.com>
Cc: netdev@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v2 3/3] doc: new 'mptcp' page in 'networking'
Hi,
Fix a few run-on sentences:
On 5/28/24 1:09 AM, Matthieu Baerts (NGI0) wrote:
> A general documentation about MPTCP was missing since its introduction
> in v5.6.
>
> Most of what is there comes from our recently updated mptcp.dev website,
> with additional links to resources from the kernel documentation.
>
> This is a first version, mainly targeting app developers and users.
>
> Link: https://www.mptcp.dev
> Reviewed-by: Mat Martineau <martineau@...nel.org>
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@...nel.org>
> ---
> Notes:
> - v2:
> - Fix mptcp.dev link syntax.
> ---
> Documentation/networking/index.rst | 1 +
> Documentation/networking/mptcp.rst | 156 +++++++++++++++++++++++++++++++++++++
> MAINTAINERS | 2 +-
> 3 files changed, 158 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
> index 7664c0bfe461..a6443851a142 100644
> --- a/Documentation/networking/index.rst
> +++ b/Documentation/networking/index.rst
> @@ -72,6 +72,7 @@ Contents:
> mac80211-injection
> mctp
> mpls-sysctl
> + mptcp
> mptcp-sysctl
> multiqueue
> multi-pf-netdev
> diff --git a/Documentation/networking/mptcp.rst b/Documentation/networking/mptcp.rst
> new file mode 100644
> index 000000000000..ee0ae68ca271
> --- /dev/null
> +++ b/Documentation/networking/mptcp.rst
> @@ -0,0 +1,156 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=====================
> +Multipath TCP (MPTCP)
> +=====================
> +
> +Introduction
> +============
> +
> +Multipath TCP or MPTCP is an extension to the standard TCP and is described in
> +`RFC 8684 (MPTCPv1) <https://www.rfc-editor.org/rfc/rfc8684.html>`_. It allows a
> +device to make use of multiple interfaces at once to send and receive TCP
> +packets over a single MPTCP connection. MPTCP can aggregate the bandwidth of
> +multiple interfaces or prefer the one with the lowest latency, it also allows a
latency. It also
> +fail-over if one path is down, and the traffic is seamlessly reinjected on other
> +paths.
> +
> +For more details about Multipath TCP in the Linux kernel, please see the
> +official website: `mptcp.dev <https://www.mptcp.dev>`_.
> +
> +
> +Use cases
> +=========
> +
> +Thanks to MPTCP, being able to use multiple paths in parallel or simultaneously
> +brings new use-cases, compared to TCP:
> +
> +- Seamless handovers: switching from one path to another while preserving
> + established connections, e.g. to be used in mobility use-cases, like on
> + smartphones.
> +- Best network selection: using the "best" available path depending on some
> + conditions, e.g. latency, losses, cost, bandwidth, etc.
> +- Network aggregation: using multiple paths at the same time to have a higher
> + throughput, e.g. to combine fixed and mobile networks to send files faster.
> +
> +
> +Concepts
> +========
> +
> +Technically, when a new socket is created with the ``IPPROTO_MPTCP`` protocol
> +(Linux-specific), a *subflow* (or *path*) is created. This *subflow* consists of
> +a regular TCP connection that is used to transmit data through one interface.
> +Additional *subflows* can be negotiated later between the hosts. For the remote
> +host to be able to detect the use of MPTCP, a new field is added to the TCP
> +*option* field of the underlying TCP *subflow*. This field contains, amongst
> +other things, a ``MP_CAPABLE`` option that tells the other host to use MPTCP if
> +it is supported. If the remote host or any middlebox in between does not support
> +it, the returned ``SYN+ACK`` packet will not contain MPTCP options in the TCP
> +*option* field. In that case, the connection will be "downgraded" to plain TCP,
> +and it will continue with a single path.
> +
> +This behavior is made possible by two internal components: the path manager, and
> +the packet scheduler.
> +
> +Path Manager
> +------------
> +
> +The Path Manager is in charge of *subflows*, from creation to deletion, and also
> +address announcements. Typically, it is the client side that initiates subflows,
> +and the server side that announces additional addresses via the ``ADD_ADDR`` and
> +``REMOVE_ADDR`` options.
> +
> +Path managers are controlled by the ``net.mptcp.pm_type`` sysctl knob -- see
> +mptcp-sysctl.rst. There are two types: the in-kernel one (type ``0``) where the
> +same rules are applied for all the connections (see: ``ip mptcp``) ; and the
> +userspace one (type ``1``), controlled by a userspace daemon (i.e. `mptcpd
> +<https://mptcpd.mptcp.dev/>`_) where different rules can be applied for each
> +connection. The path managers can be controlled via a Netlink API, see
API; see
> +netlink_spec/mptcp_pm.rst.
> +
> +To be able to use multiple IP addresses on a host to create multiple *subflows*
> +(paths), the default in-kernel MPTCP path-manager needs to know which IP
> +addresses can be used. This can be configured with ``ip mptcp endpoint`` for
> +example.
> +
> +Packet Scheduler
> +----------------
> +
> +The Packet Scheduler is in charge of selecting which available *subflow(s)* to
> +use to send the next data packet. It can decide to maximize the use of the
> +available bandwidth, only to pick the path with the lower latency, or any other
> +policy depending on the configuration.
> +
> +Packet schedulers are controlled by the ``net.mptcp.scheduler`` sysctl knob --
> +see mptcp-sysctl.rst.
> +
> +
> +Sockets API
> +===========
> +
> +Creating MPTCP sockets
> +----------------------
> +
> +On Linux, MPTCP can be used by selecting MPTCP instead of TCP when creating the
> +``socket``:
> +
> +.. code-block:: C
> +
> + int sd = socket(AF_INET(6), SOCK_STREAM, IPPROTO_MPTCP);
> +
> +Note that ``IPPROTO_MPTCP`` is defined as ``262``.
> +
> +If MPTCP is not supported, ``errno`` will be set to:
> +
> +- ``EINVAL``: (*Invalid argument*): MPTCP is not available, on kernels < 5.6.
> +- ``EPROTONOSUPPORT`` (*Protocol not supported*): MPTCP has not been compiled,
> + on kernels >= v5.6.
> +- ``ENOPROTOOPT`` (*Protocol not available*): MPTCP has been disabled using
> + ``net.mptcp.enabled`` sysctl knob, see mptcp-sysctl.rst.
knob; see
> +
> +MPTCP is then opt-in: applications need to explicitly request it. Note that
> +applications can be forced to use MPTCP with different techniques, e.g.
> +``LD_PRELOAD`` (see ``mptcpize``), eBPF (see ``mptcpify``), SystemTAP,
> +``GODEBUG`` (``GODEBUG=multipathtcp=1``), etc.
> +
> +Switching to ``IPPROTO_MPTCP`` instead of ``IPPROTO_TCP`` should be as
> +transparent as possible for the userspace applications.
> +
> +Socket options
> +--------------
> +
> +MPTCP supports most socket options handled by TCP. It is possible some less
> +common options are not supported, but contributions are welcome.
> +
> +Generally, the same value is propagated to all subflows, including the ones
> +created after the calls to ``setsockopt()``. eBPF can be used to set different
> +values per subflow.
> +
> +There are some MPTCP specific socket options at the ``SOL_MPTCP`` (284) level to
> +retrieve info. They fill the ``optval`` buffer of the ``getsockopt()`` system
> +call:
> +
> +- ``MPTCP_INFO``: Uses ``struct mptcp_info``.
> +- ``MPTCP_TCPINFO``: Uses ``struct mptcp_subflow_data``, followed by an array of
> + ``struct tcp_info``.
> +- ``MPTCP_SUBFLOW_ADDRS``: Uses ``struct mptcp_subflow_data``, followed by an
> + array of ``mptcp_subflow_addrs``.
> +- ``MPTCP_FULL_INFO``: Uses ``struct mptcp_full_info``, with one pointer to an
> + array of ``struct mptcp_subflow_info`` (including the
> + ``struct mptcp_subflow_addrs``), and one pointer to an array of
> + ``struct tcp_info``, followed by the content of ``struct mptcp_info``.
> +
> +Note that at the TCP level, ``TCP_IS_MPTCP`` socket option can be used to know
> +if MPTCP is currently being used: the value will be set to 1 if it is.
> +
> +
> +Design choices
> +==============
> +
> +A new socket type has been added for MPTCP for the userspace-facing socket. The
> +kernel is in charge of creating subflow sockets: they are TCP sockets where the
> +behavior is modified using TCP-ULP.
> +
> +MPTCP listen sockets will create "plain" *accepted* TCP sockets if the
> +connection request from the client didn't ask for MPTCP, making the performance
> +impact minimal when MPTCP is enabled by default.
--
#Randy
https://people.kernel.org/tglx/notes-about-netiquette
https://subspace.kernel.org/etiquette.html
Powered by blists - more mailing lists