[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250422154433.GN823903@nvidia.com>
Date: Tue, 22 Apr 2025 12:44:33 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Sean Hefty <shefty@...dia.com>
Cc: "Ziemba, Ian" <ian.ziemba@....com>,
Bernard Metzler <BMT@...ich.ibm.com>,
Roland Dreier <roland@...abrica.net>,
Nikolay Aleksandrov <nikolay@...abrica.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"shrijeet@...abrica.net" <shrijeet@...abrica.net>,
"alex.badea@...sight.com" <alex.badea@...sight.com>,
"eric.davis@...adcom.com" <eric.davis@...adcom.com>,
"rip.sohan@....com" <rip.sohan@....com>,
"dsahern@...nel.org" <dsahern@...nel.org>,
"winston.liu@...sight.com" <winston.liu@...sight.com>,
"dan.mihailescu@...sight.com" <dan.mihailescu@...sight.com>,
Kamal Heib <kheib@...hat.com>,
"parth.v.parikh@...sight.com" <parth.v.parikh@...sight.com>,
Dave Miller <davem@...hat.com>,
"andrew.tauferner@...nelisnetworks.com" <andrew.tauferner@...nelisnetworks.com>,
"welch@....com" <welch@....com>,
"rakhahari.bhunia@...sight.com" <rakhahari.bhunia@...sight.com>,
"kingshuk.mandal@...sight.com" <kingshuk.mandal@...sight.com>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
"kuba@...nel.org" <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>
Subject: Re: [RFC PATCH 00/13] Ultra Ethernet driver introduction
On Fri, Apr 18, 2025 at 04:50:24PM +0000, Sean Hefty wrote:
> > On Thu, Apr 17, 2025 at 02:59:58AM +0000, Sean Hefty wrote:
> > > > I think the "Relative Addressing" Ian described is just a PD
> > > > pointing to a single job and all MRs within the PD linked to a single job. Is
> > there more than that?
> > >
> > > Relative / absolute addressing is in regard to the endpoint address.
> > > I.e. the equivalent of the QPN.
> > >
> > > With relative addressing, the QPN is relative to the job ID. So
> > > QPN=5 for job=2 and QPN=5 for job=3 may or may not be the same HW
> > > resource. A HW QP may still belong to multiple jobs, if supported by
> > > the vendor.
> >
> > Yes, but I think the key distinction is that everything is relative to, or contained
> > with in the job key so we only have ony job key and every single object
> > touched by a packet must be within that job. That is the same security model
> > as PD if the PD has 1 job.
>
> Relative addressing does not constrain the QP to a single job.
> QPN=5 job=2 and QPN=4 job=3 may be the same HW QP. There's a
> per-job table/hash/tree used to map QPNs to HW queues. A multi-port
> NIC may need separate per-job tables per port.
I would say QPN=5 QPN=4 are the objects, and they are constrained.
If there are other objects outside the PD/Job (like some kind of
shared queue) then that is a different thing.
It is why I asked if we can have the "new queue" inside different
PDs. Forget about language, there is an on-the-wire lable that
identifies the QPN and that QPN must be 1:1 with the job. That can be
a direct software object, even if it does not come with any queues,
but delivers to some other queue-holding object that is outside the
PD.
> My guess is storage is allocated and configured prior to launching
> the compute nodes using the mechanism being defined. Once the
> compute portion of the job completes, the storage portion of the job
> is removed. I have not heard of a specific plan in this area,
> however.
That seems too vauge for an OS implementation.. We have to define how
"configured" works, and how do the various components, for instance
kernel storage components, get permission to use the required job
keys.
> I was thinking of security key as an independent object, passed as
> an attribute when creating the top-level job. The separation is so
> a job isn't needed to apply encryption to some RDMA QP in the
> future. It seems possible to define security key as a component of
> the top-level job (and give job a new name), rather than an
> independent object.
I would probably duplicate the keys, both as part of a job and as part
of an address handle if that is the worry.
The schema doesn't need to be fully normalized, that can be harmful
when we are talking about different security contexts. A job
encryption key is some global cross-process object and a AH is a
per-process, per-uverbs context object. They should not be the same.
> > > A separate security key made more sense to me when I considered
> > > applying it to an RC QP. Additionally, an MPI/AI job may require
> > > multiple job objects, one for each IP address. (Imagine a system
> > > connected to separate networks, such that the job ID value cannot be
> > > global). A single security key can be used with all job instances.
> >
> > I haven't heard any definition of how the job id is actually matched.
>
> With absolute addressing, the QPN finds the QP through some
> table/hash/lookup.
I meant how the job ID is matched starting from the head of the
ethernet packet.
You cannot have "separate networks with non-global job IDs" without
more strictly defining how the job is determined, by including things
like IP addresses pairs and possibly more.
If the job number in the packet is port-global, or vlan global or
something, then it is global and we don't need to worry about
"separate networks" because that isn't possible.
Jason
Powered by blists - more mailing lists