[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <PH7PR21MB326304834D36451E7609D102CE999@PH7PR21MB3263.namprd21.prod.outlook.com>
Date: Fri, 29 Jul 2022 18:44:22 +0000
From: Long Li <longli@...rosoft.com>
To: Jason Gunthorpe <jgg@...pe.ca>
CC: Dexuan Cui <decui@...rosoft.com>,
KY Srinivasan <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Stephen Hemminger <sthemmin@...rosoft.com>,
Wei Liu <wei.liu@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Leon Romanovsky <leon@...nel.org>,
"edumazet@...gle.com" <edumazet@...gle.com>,
"shiraz.saleem@...el.com" <shiraz.saleem@...el.com>,
Ajay Sharma <sharmaajay@...rosoft.com>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>
Subject: RE: [Patch v4 03/12] net: mana: Handle vport sharing between devices
> Subject: Re: [Patch v4 03/12] net: mana: Handle vport sharing between devices
>
> On Thu, Jul 21, 2022 at 05:58:39PM +0000, Long Li wrote:
> > > > "vport" is a hardware resource that can either be used by an
> > > > Ethernet device, or an RDMA device. But it can't be used by both
> > > > at the same time. The "vport" is associated with a protection
> > > > domain and doorbell, it's programmed in the hardware. Outgoing
> > > > traffic is enforced on this vport based on how it is programmed.
> > >
> > > Sure, but how is the users problem to "get this configured right"
> > > and what exactly is the user supposed to do?
> > >
> > > I would expect the allocation of HW resources to be completely
> > > transparent to the user. Why is it not?
> > >
> >
> > In the hardware, RDMA RAW_QP shares the same hardware resource (in
> > this case, the vPort in hardware table) with the ethernet NIC. When an
> > RDMA user creates a RAW_QP, we can't just shut down the ethernet. The
> > user is required to make sure the ethernet is not in used when he
> > creates this QP type.
>
> You haven't answered my question - how is the user supposed to achieve this?
The user needs to configure the network interface so the kernel will not use it when the user creates a RAW QP on this port.
This can be done via system configuration to not bring this interface online on system boot, or equivalently doing "ifconfig xxx down" to make the interface down when creating a RAW QP on this port.
>
> And now I also want to know why the ethernet device and rdma device can even
> be loaded together if they cannot share the physical port?
> Exclusivity is not a sharing model that any driver today implements.
>
This physical port limitation only applies to the RAW QP. For RC QP, the hardware doesn't have this limitation. The user can create RC QPs on a physical port up to the hardware limits independent of the Ethernet usage on the same port.
For Ethernet usage, the hardware supports only one active user on a physical port. The driver checks on the port usage before programming the hardware when creating the RAW QP. Because the RDMA driver doesn't know in advance which QP type the user will create, it exposes the device with all its ports. The user may not be able to create RAW QP on a port if this port is already in used by the kernel.
As a comparison, Mellanox NICs can expose both Ethernet and RDMA RAW_QP on the same physical port to software. They can work at the same time, but with some "quirks". The RDMA RAW_QP can preempt/interfere Ethernet traffic under certain conditions commonly used by DPDK (a heavy user of RAW_QP).
Here are two scenarios that a Mellanox NIC port works on both Ethernet and RAW_QP.
Scenario 1: The Ethernet loses TCP connection.
1. User A runs a program listing on a TCP port, accepts an incoming TCP connection and is communicating with the remote peer over this TCP connection.
2. User B creates an RDMA RAW_QP on the same port on the device.
3. As soon as the RAW_QP is created, the program in 1 can't send/receive data over this TCP connection. After some period of inactivity, the TCP connection terminates.
Please note that this may also pose a security risk. User B with RAW_QP can potentially hijack this TCP connection from the kernel by framing the correct Ethernet packets and send over this QP to trick the remote peer, making it believe it's User A.
Scenario 2: The Ethernet port state changes after RDMA RAW_QP is used on the port.
1. User uses "ifconfig ethx down" on the NIC, intending to make it offline
2. User creates a RDMA RAW_QP on the same port on the device.
3. User destroys this RAW_QP.
4. The ethx device in 1 reports carrier state in step 2, in many Linux distributions this makes it online without user interaction. "ifconfig ethx" shows its state changes to "up".
The two activities on Ethernet and on RDMA RAW_QP should not happen concurrently and the user either gets unexpected behavior (Scenario 1) or the user needs to explicitly serialize the use (Scenario 2). In this sense, I think MANA is not materially different to how the Mellanox NICs implement the RAW_QP. IMHO, it's better to have the user explicitly decide whether to use Ethernet or RDMA RAW_QP on a specific port.
Long
Powered by blists - more mailing lists