[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ed6bde75-2783-446e-b667-204ed55071b5@kernel.org>
Date: Fri, 7 Jun 2024 16:47:52 +0200
From: Matthieu Baerts <matttbe@...nel.org>
To: "D. Wythe" <alibuda@...ux.alibaba.com>,
Mat Martineau <martineau@...nel.org>
Cc: kgraul@...ux.ibm.com, wenjia@...ux.ibm.com, jaka@...ux.ibm.com,
wintera@...ux.ibm.com, guwen@...ux.alibaba.com, kuba@...nel.org,
davem@...emloft.net, netdev@...r.kernel.org, linux-s390@...r.kernel.org,
linux-rdma@...r.kernel.org, tonylu@...ux.alibaba.com, pabeni@...hat.com,
edumazet@...gle.com
Subject: Re: [PATCH net-next v6 3/3] net/smc: Introduce IPPROTO_SMC
Hi D.Wythe,
On 07/06/2024 07:09, D. Wythe wrote:
>
> On 6/7/24 5:22 AM, Mat Martineau wrote:
>> On Wed, 5 Jun 2024, D. Wythe wrote:
>>
>>> From: "D. Wythe" <alibuda@...ux.alibaba.com>
>>>
>>> This patch allows to create smc socket via AF_INET,
>>> similar to the following code,
>>>
>>> /* create v4 smc sock */
>>> v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC);
>>>
>>> /* create v6 smc sock */
>>> v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC);
>>>
>>> There are several reasons why we believe it is appropriate here:
>>>
>>> 1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6)
>>> address. There is no AF_SMC address at all.
>>>
>>> 2. Create smc socket in the AF_INET(6) path, which allows us to reuse
>>> the infrastructure of AF_INET(6) path, such as common ebpf hooks.
>>> Otherwise, smc have to implement it again in AF_SMC path.
>>>
>>> Signed-off-by: D. Wythe <alibuda@...ux.alibaba.com>
>>> Tested-by: Niklas Schnelle <schnelle@...ux.ibm.com>
>>> ---
>>> include/uapi/linux/in.h | 2 +
>>> net/smc/Makefile | 2 +-
>>> net/smc/af_smc.c | 16 ++++-
>>> net/smc/smc_inet.c | 169 +++++++++++++++++++++++++++++++++++++++
>>> +++++++++
>>> net/smc/smc_inet.h | 22 +++++++
>>> 5 files changed, 208 insertions(+), 3 deletions(-)
>>> create mode 100644 net/smc/smc_inet.c
>>> create mode 100644 net/smc/smc_inet.h
>>>
>>> diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
>>> index e682ab6..0c6322b 100644
>>> --- a/include/uapi/linux/in.h
>>> +++ b/include/uapi/linux/in.h
>>> @@ -83,6 +83,8 @@ enum {
>>> #define IPPROTO_RAW IPPROTO_RAW
>>> IPPROTO_MPTCP = 262, /* Multipath TCP connection */
>>> #define IPPROTO_MPTCP IPPROTO_MPTCP
>>> + IPPROTO_SMC = 263, /* Shared Memory Communications */
>>> +#define IPPROTO_SMC IPPROTO_SMC
>>
>> Hello,
>>
>> It's not required to assign IPPROTO_MPTCP+1 as your new IPPROTO_SMC
>> value. Making IPPROTO_MAX larger does increase the size of the
>> inet_diag_table. Values from 256 to 261 are usable for IPPROTO_SMC
>> without increasing IPPROTO_MAX.
>>
>> Just for background: When we added IPPROTO_MPTCP, we chose 262 because
>> it is IPPROTO_TCP+0x100. The IANA reserved protocol numbers are 8 bits
>> wide so we knew we would not conflict with any future additions, and
>> in the case of MPTCP is was convenient that truncating the proto value
>> to 8 bits would match IPPROTO_TCP.
>>
>> - Mat
>>
>
> Hi Mat,
>
> Thank you very much for your feedback, I have always been curious about
> the origins of IPPROTO_MPTCP and I am glad to
> have learned new knowledge.
>
> Regarding the size issue of inet_diag_tables, what you said does make
> sense. However, we still hope to continue using 263,
> although the rationale may not be fully sufficient, as this series has
> been under community evaluation for quite some time now,
> and we haven't received any feedback about this value, so we’ve been
> using it in some user-space tools ... 🙁
>
> I would like to see what the community thinks. If everyone agrees that
> using 263 will be completely unacceptable and a disaster,
> then we will have no choice but to change it.
It will not be a disaster, but a small waste of space (even if
CONFIG_SMC is not set).
Also, please note that the introduction of IPPROTO_MPTCP caused some
troubles in some userspace programs. That was mainly because IPPROTO_MAX
got updated, and they didn't expect that, e.g. a quick search on GitHub
gave me this:
https://github.com/systemd/systemd/issues/15604
https://github.com/strace/strace/issues/164
https://github.com/rust-lang/libc/issues/1896
I guess these userspace programs should now be ready for a new update,
but still, it might be better to avoid that if there is a "simple" solution.
I understand changing your userspace tools will be annoying. (On the
other hand, it is still time to do that :) )
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
Powered by blists - more mailing lists