[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoMkdTuJ8Oe+S2L+t6m3Q4UdMfJhFFhdjpdZbD7HLAadsdg@mail.gmail.com>
Date: Tue, 8 Apr 2025 10:16:45 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Leon Romanovsky <leon@...nel.org>, Nikolay Aleksandrov <nikolay@...abrica.net>,
Linux Kernel Network Developers <netdev@...r.kernel.org>, Shrijeet Mukherjee <shrijeet@...abrica.net>, alex.badea@...sight.com,
eric.davis@...adcom.com, rip.sohan@....com, David Ahern <dsahern@...nel.org>,
bmt@...ich.ibm.com, roland@...abrica.net,
Winston Liu <winston.liu@...sight.com>, dan.mihailescu@...sight.com, kheib@...hat.com,
parth.v.parikh@...sight.com, davem@...hat.com, ian.ziemba@....com,
andrew.tauferner@...nelisnetworks.com, welch@....com,
rakhahari.bhunia@...sight.com, kingshuk.mandal@...sight.com,
linux-rdma@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>
Subject: Re: Netlink vs ioctl WAS(Re: [RFC PATCH 00/13] Ultra Ethernet driver introduction
Sorry was too distracted elsewhere..
On Wed, Mar 26, 2025 at 11:50 AM Jason Gunthorpe <jgg@...dia.com> wrote:
>
> On Tue, Mar 25, 2025 at 10:12:49AM -0400, Jamal Hadi Salim wrote:
>
[Trimmed for brevity..]
> > For a read() to fail at say copy_to_user() feels like your app or
> > system must be in really bad shape.
>
> Yes, but still the semantic we want is that if a creation ioctl
> returns 0 (success) then the object exists and if it returns any error
> code then the creation was a NOP.
>
> > A contingency plan could be to replay the message from the app/control
> > plane and hope you get an "object doesnt exist" kind of message for a
> > failed destroy msg.
>
> Nope, it's racey, it must be multi-threaded safe. Another thread could
> have created and re-used the object ID.
>
> > IOW, while unwinding is more honorable, unless it comes for cheap it
> > may not be worth it.
>
> It was cheap
>
> > Regardless: How would RDMA unwind in such a case?
>
> The object infrastructure takes care of this with a three step object
> creation protocol and some helpers.
>
[..]
> > When you say "driver" you mean "control/provisioning plane" activity
> > between a userspace control app and kernel objects which likely
> > extend
>
> No, I literally mean driver.
>
> The user of this HW will not do something like socket() as standard
> system call abstracted by the kernel. Instead it makes a library call
> ib_create_qp() which goes into a library with the userspace driver
> components. The abstraction is now done in userspace. The library
> figures out what HW the kernel has and loads a userspace driver
> component with a driver_create_qp() op that does more processing and
> eventually calls the kernel.
>
> It is "control path" in the sense that it is slow path creating
> objects for data transfer, but the purpose of most of the actions is
> actually setting up for data plane operations.
>
Ok, if i read correctly thus far - seems you have some (3 phase)
transactional approach?
Earlier phase with this user driver interaction which guarantees
needed resources being available that subsequent phases then use..
> > If my reading is right, some comments:
> > 1) You can achieve this fine with netlink. My view of the model is you
> > would have a T (call it VendorData, which is is defined within the
> > common namespace) that puts the vendor specific TLVs within a
> > hierarchy.
>
> Yes, that was a direction that was suggested here too. But when we got
> to micro optimizing the ioctl ABI format it became clear there was
> significant advantage to keeping things one level and not trying to do
> some kind of nesting. This also gives a nice simple in-kernel API for
> working with method arguments, it is always the same. We don't have
> different APIs depending on driver/common callers.
>
agreed, flat namespace is a win as long as the modelling doesnt have
to be squished into a round-peg-for-square-hole abstraction.
> > 2) Hopefully the vendor extensions are in the minority. Otherwise the
> > complexity of someone writing an app to control multiple vendors would
> > be challenging over time as different vendors add more attributes.
>
> Nope, it is about 50/50, and there is not a challenge because the
> methodology is everyone uses the *same* userspace driver code. It is
> too complicated for people to reasonable try to rewrite.
>
> > I cant imagine a commonly used utility like iproute2/tc being
> > invoked with "when using broadcom then use foo=x bar=y" apply but
> > when using intel use "goo=x-1 and gah=y-2".
>
> Right, it doesn't make sense for a tool like iproute, but we aren't
> building anything remotely like iproute.
>
My point was on the API. I dont know enough so pardon my ignorance. My
basic assumption is there is common cross-vendor tooling and that
deployments may have to be multi-vendor. If that assumption is wrong
then then my concern is not valid.
If my assumption is correct, whatever provisioning app is involved it
needs to keep track of the multiple vendor interfacing - which means
the code will have to understand different semantics across vendors.
> > 3) A Pro/con to #2 depending on which lens you use: it could be
> > "innnovation" or "vendor lockin" - depends on the community i.e on the
> > one hand a vendor could add features faster and is not bottlenecked by
> > endless mailing list discussions but otoh, said vendor may not be in
> > any hurry to move such features to the common path (because it gives
> > them an advantage).
>
> There is no community advantage to the common kernel path.
>
> The users all use the library, the only thing that matters is how
> accessible the vendor has made their unique ideas to the library
> users.
>
> For instance, if the user is running a MPI application and the vendor
> makes standard open source MPI 5% faster with some unique HW
> innovation should anyone actually care about the "common path" deep,
> deep below MPI?
>
I would say they shouldnt care because the customer gets to benefit.
But on the flip side, again, that is counting on the goodwill of the
vendor.
cheers,
jamal
Powered by blists - more mailing lists