[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250210145255.793e6639.pasic@linux.ibm.com>
Date: Mon, 10 Feb 2025 14:52:55 +0100
From: Halil Pasic <pasic@...ux.ibm.com>
To: Guangguan Wang <guangguan.wang@...ux.alibaba.com>
Cc: Paolo Abeni <pabeni@...hat.com>, wenjia@...ux.ibm.com, jaka@...ux.ibm.com,
alibuda@...ux.alibaba.com, tonylu@...ux.alibaba.com,
guwen@...ux.alibaba.com, davem@...emloft.net, edumazet@...gle.com,
kuba@...nel.org, horms@...nel.org, linux-rdma@...r.kernel.org,
linux-s390@...r.kernel.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, Alexandra Winter <wintera@...ux.ibm.com>,
Halil Pasic <pasic@...ux.ibm.com>
Subject: Re: [PATCH net] net/smc: use the correct ndev to find pnetid by
pnetid table
On Fri, 10 Jan 2025 13:43:44 +0800
Guangguan Wang <guangguan.wang@...ux.alibaba.com> wrote:
> We want to use SMC in container on cloud environment, and encounter problem
> when using smc_pnet with commit 890a2cb4a966. In container, there have choices
> of different container network, such as directly using host network, virtual
> network IPVLAN, veth, etc. Different choices of container network have different
> netdev hierarchy. Examples of netdev hierarchy show below. (eth0 and eth1 in host
> below is the netdev directly related to the physical device).
> _______________________________ ________________________________
> | _________________ | | _________________ |
> | |POD | | | |POD __________ | |
> | | | | | | |upper_ndev| | |
> | | eth0_________ | | | |eth0|__________| | |
> | |____| |__| | | |_______|_________| |
> | | | | | |lower netdev |
> | | | | | __|______ |
> | eth1|base_ndev| eth0_______ | | eth1| | eth0_______ |
> | | | | RDMA || | |base_ndev| | RDMA ||
> | host |_________| |_______|| | host |_________| |_______||
> ———————————————————————————————- ———————————————————————————————-
> netdev hierarchy if directly netdev hierarchy if using IPVLAN
> using host network
> _______________________________
> | _____________________ |
> | |POD _________ | |
> | | |base_ndev|| |
> | |eth0(veth)|_________|| |
> | |____________|________| |
> | |pairs |
> | _______|_ |
> | | | eth0_______ |
> | veth|base_ndev| | RDMA ||
> | |_________| |_______||
> | _________ |
> | eth1|base_ndev| |
> | host |_________| |
> ———————————————————————————————
> netdev hierarchy if using veth
>
> Due to some reasons, the eth1 in host is not RDMA attached netdevice, pnetid
> is needed to map the eth1(in host) with RDMA device so that POD can do SMC-R.
> Because the eth1(in host) is managed by CNI plugin(such as Terway, network
> management plugin in container environment), and in cloud environment the
> eth(in host) can dynamically be inserted by CNI when POD create and dynamically
> be removed by CNI when POD destroy and no POD related to the eth(in host)
> anymore.
I'm pretty clueless when it comes to the details of CNI but I think
I'm barely able to follow. Nevertheless if you have the feeling that
my extrapolations are wrong, please do point it out.
> It is hard for us to config the pnetid to the eth1(in host). So we
> config the pnetid to the netdevice which can be seen in POD.
Hm, this sounds like you could set PNETID on eth1 (in host) for each of
the cases and everything would be cool (and would work), but because CNI
and the environment do not support it, or supports it in a very
inconvenient way, you are looking for a workaround where PNETID is set
in the POD. Is that right? Or did I get something wrong?
> When do SMC-R, both
> the container directly using host network and the container using veth network
> can successfully match the RDMA device, because the configured pnetid netdev is a
> base_ndev. But the container using IPVLAN can not successfully match the RDMA
> device and 0x03030000 fallback happens, because the configured pnetid netdev is
> not a base_ndev. Additionally, if config pnetid to the eth1(in host) also can not
> work for matching RDMA device when using veth network and doing SMC-R in POD.
That I guess answers my question from the first paragraph. Setting
PNETID on eth1 (host) would not be sufficient for veth. Right?
Another silly question: is making the PNETID basically a part of the Pod
definition shifting PNETID from the realm of infrastructure (i.e.
configured by the cloud provider) to the ream of an application (i.e.
configured by the tenant)?
AFAIU veth (host) is bridged (or similar) to eth1 (host) and that is in
the host, and this is where we make sure that the requirements for SMC-R
are satisfied.
But veth (host) could be attached to eth3 which is on a network not
reachable via eth0 (host) or eth1 (host). In that case the pod could
still set PNETID on veth (POD). Or?
>
> My patch can resolve the problem we encountered and also can unify the pnetid setup
> of different network choices list above, assuming the pnetid is not limited to
> config to the base_ndev directly related to the physical device(indeed, the current
> implementation has not limited it yet).
I see some problems here, but I'm afraid we see different problems. For
me not being able to set eth0 (veth/POD)'s PNEDID from the host is a
problem. Please notice that with the current implementation users can
only control the PNETID if infrastructure does not do so in the first
place.
Can you please help me reason about this? I'm unfortunately lacking
Kubernetes skills here, and it is difficult for me to think along.
Regards,
Halil
Powered by blists - more mailing lists