[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3840993e-5e20-4657-8d5e-e0e44c613a40@enfabrica.net>
Date: Wed, 12 Mar 2025 11:20:27 +0200
From: Nikolay Aleksandrov <nikolay@...abrica.net>
To: Sean Hefty <shefty@...dia.com>, Bernard Metzler <BMT@...ich.ibm.com>,
Parav Pandit <parav@...dia.com>, Leon Romanovsky <leon@...nel.org>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"shrijeet@...abrica.net" <shrijeet@...abrica.net>,
"alex.badea@...sight.com" <alex.badea@...sight.com>,
"eric.davis@...adcom.com" <eric.davis@...adcom.com>,
"rip.sohan@....com" <rip.sohan@....com>,
"dsahern@...nel.org" <dsahern@...nel.org>,
"roland@...abrica.net" <roland@...abrica.net>,
"winston.liu@...sight.com" <winston.liu@...sight.com>,
"dan.mihailescu@...sight.com" <dan.mihailescu@...sight.com>,
Kamal Heib <kheib@...hat.com>,
"parth.v.parikh@...sight.com" <parth.v.parikh@...sight.com>,
Dave Miller <davem@...hat.com>, "ian.ziemba@....com" <ian.ziemba@....com>,
"andrew.tauferner@...nelisnetworks.com"
<andrew.tauferner@...nelisnetworks.com>, "welch@....com" <welch@....com>,
"rakhahari.bhunia@...sight.com" <rakhahari.bhunia@...sight.com>,
"kingshuk.mandal@...sight.com" <kingshuk.mandal@...sight.com>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
"kuba@...nel.org" <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Jason Gunthorpe <jgg@...dia.com>
Subject: Re: [RFC PATCH 00/13] Ultra Ethernet driver introduction
On 3/11/25 7:11 PM, Sean Hefty wrote:
>> I am not sure if a new subsystem is what this RFC calls for, but rather a
>> discussion about the proper integration of a new RDMA transport into the
>> Linux kernel.
>>
>> Ultra Ethernet Transport is probably not just another transport up for easy
>> integration into the current RDMA subsystem.
>> First of all, its design does not follow the well-known RDMA verbs model
>> inherited from InfiniBand, which has largely shaped the current structure of
>> the RDMA subsystem. While having send, receive and completion queues (and
>> completion counters) to steer message exchange, there is no concept of a
>> queue pair. Endpoints can span multiple queues, can have multiple peer
>> addresses.
>> Communication resources sharing is controlled in a different way than within
>> protection domains. Connections are ephemeral, created and released by the
>> provider as needed. There are more differences. In a nutshell, the UET
>> communication model is trimmed for extreme scalability. Its API semantics
>> follow libfabrics, not RDMA verbs.
>>
>> I think Nik gave us a first still incomplete look at the UET protocol engine to
>> help us understand some of the specifics.
>> It's just the lower part (packet delivery). The implementation of the upper part
>> (resource management, communication semantics, job management) may
>> largely depend on the environment we all choose.
>>
>> IMO, integrating UET with the current RDMA subsystem would ask for its
>> extension to allow exposing all of UETs intended functionality, probably
>> starting with a more generic RDMA device model than current ib_device.
>>
>> The different API semantics of UET may further call for either extending verbs
>> to cover it as well, or exposing a new non-verbs API (libfabrics), or both.
>
> Reading through the submissions, what I found lacking is a description of some higher-level plan. I don't easily see how to relate this series to NICs that may implement UET in HW.
>
> Should the PDS be viewed as a partial implementation of a SW UET 'device', similar to soft RoCE or iWarp? If so, having a description of a proposed device model seems like a necessary first step.
>
Hi Sean,
To quote the cover letter:
"...As there
isn't any UET hardware available yet, we introduce a software
device model which implements the lowest sublayer of the spec - PDS..."
and
"The plan is to have that split into core Ultra Ethernet module
(ultraeth.ko) which is responsible for managing the UET contexts, jobs
and all other common/generic UET configuration, and the software UET
device model (uecon.ko) which implements the UET protocols for
communication in software (e.g. the PDS will be a part of uecon) and is
represented by a UDP tunnel network device."
So as I said, it is in very early stage, but we plan to split this into
core UET code and uecon software device model that implements the UEC
specs.
> If, instead, the PDS should be viewed more along the lines of a partial RDS-like path, then that changes the uapi.
>
> Or, am I not viewing this series as intended at all?
>
> It is almost guaranteed that there will be NICs which will support both RoCE and UET, and it's not farfetched to think that an app may use both simultaneously. IMO, a common device model is ideal, assuming exposing a device model is the intent.
>
That is the goal and we're working on UET kernel device API as I've
noted in the cover letter.
> I agree that different transport models should not be forced together unnaturally, but I think that's solvable. In the end, the application developer is exposed to libfabric naming anyway. Besides, even a repurposed RDMA name is still better than the naming used within OpenMPI. :)
>
> - Sean
Cheers,
Nik
Powered by blists - more mailing lists