netdev - RE: [RFC PATCH 00/13] Ultra Ethernet driver introduction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <DM6PR12MB4313E0C29EA74A94748EBD53BDBF2@DM6PR12MB4313.namprd12.prod.outlook.com>
Date: Fri, 18 Apr 2025 16:50:24 +0000
From: Sean Hefty <shefty@...dia.com>
To: Jason Gunthorpe <jgg@...dia.com>
CC: "Ziemba, Ian" <ian.ziemba@....com>, Bernard Metzler <BMT@...ich.ibm.com>,
	Roland Dreier <roland@...abrica.net>, Nikolay Aleksandrov
	<nikolay@...abrica.net>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"shrijeet@...abrica.net" <shrijeet@...abrica.net>, "alex.badea@...sight.com"
	<alex.badea@...sight.com>, "eric.davis@...adcom.com"
	<eric.davis@...adcom.com>, "rip.sohan@....com" <rip.sohan@....com>,
	"dsahern@...nel.org" <dsahern@...nel.org>, "winston.liu@...sight.com"
	<winston.liu@...sight.com>, "dan.mihailescu@...sight.com"
	<dan.mihailescu@...sight.com>, Kamal Heib <kheib@...hat.com>,
	"parth.v.parikh@...sight.com" <parth.v.parikh@...sight.com>, Dave Miller
	<davem@...hat.com>, "andrew.tauferner@...nelisnetworks.com"
	<andrew.tauferner@...nelisnetworks.com>, "welch@....com" <welch@....com>,
	"rakhahari.bhunia@...sight.com" <rakhahari.bhunia@...sight.com>,
	"kingshuk.mandal@...sight.com" <kingshuk.mandal@...sight.com>,
	"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>, "kuba@...nel.org"
	<kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Subject: RE: [RFC PATCH 00/13] Ultra Ethernet driver introduction

> On Thu, Apr 17, 2025 at 02:59:58AM +0000, Sean Hefty wrote:
> > > I think the "Relative Addressing" Ian described is just a PD
> > > pointing to a single job and all MRs within the PD linked to a single job. Is
> there more than that?
> >
> > Relative / absolute addressing is in regard to the endpoint address.
> > I.e. the equivalent of the QPN.
> >
> > With relative addressing, the QPN is relative to the job ID.  So
> > QPN=5 for job=2 and QPN=5 for job=3 may or may not be the same HW
> > resource.  A HW QP may still belong to multiple jobs, if supported by
> > the vendor.
> 
> Yes, but I think the key distinction is that everything is relative to, or contained
> with in the job key so we only have ony job key and every single object
> touched by a packet must be within that job. That is the same security model
> as PD if the PD has 1 job.

Relative addressing does not constrain the QP to a single job.  QPN=5 job=2 and QPN=4 job=3 may be the same HW QP.  There's a per-job table/hash/tree used to map QPNs to HW queues.  A multi-port NIC may need separate per-job tables per port.

(Let's ignore how the QP addressing gets assigned...)

> > As an example, assigning MRs to jobs allows the server to setup RMA
> > buffers with access restricted to that job.
> >
> > I have no idea how the receiver plans to enable sending back a response.
> 
> Or get access to the new job id, which seems like a more important question
> for the OS. I think I understand that there must be some privileged entity that
> grants fine grained access to jobs, but I have not seen any detail on how that
> would actually work inside the OS to cover all these cases.
> 
> Does this all-listening process have to do some kind of DBUS operation to
> request access to a job and get back a job FD? Something else? Does anyone
> have a plan in mind?
> 
> MPI seems to have a more obvious design where the launcher could be
> privileged and pass a job FD to its children. The global MPI scheduler could
> allocate the network-global job ids. Un priv processes never request a job on
> the fly.

My guess is storage is allocated and configured prior to launching the compute nodes using the mechanism being defined.  Once the compute portion of the job completes, the storage portion of the job is removed.  I have not heard of a specific plan in this area, however.

> > The second feature is called scalable
> > endpoints.  A scalable endpoint has multiple receive queues, which are
> > directly addressable by the peer.  Different jobs could target
> > different receive queues.
> 
> That's just a new queue with different addressing rules. If the new queue is
> created inside a new PD from it's endpoint are we OK then?

I.. think so.

> > I've gone back and forth between separating and combining the
> > 'security key' and job objects.  Today I opted for separate, more
> > focused objects.  Tomorrow, who knows?  Job is where addressing
> > information goes.
> 
> I don't know about combining, but it seems like security key and addressing
> are sub objects of the top level job? Is there any reason to share a security key
> with two jobs???

I doubt sharing a security key between HPC jobs is needed.  I think of the set of addresses being a component of the top-level job.  Individual addresses are sub-objects, if that's what you mean.

I was thinking of security key as an independent object, passed as an attribute when creating the top-level job.  The separation is so a job isn't needed to apply encryption to some RDMA QP in the future.  It seems possible to define security key as a component of the top-level job (and give job a new name), rather than an independent object.

> > A separate security key made more sense to me when I considered
> > applying it to an RC QP.  Additionally, an MPI/AI job may require
> > multiple job objects, one for each IP address.  (Imagine a system
> > connected to separate networks, such that the job ID value cannot be
> > global).  A single security key can be used with all job instances.
> 
> I haven't heard any definition of how the job id is actually matched.

I define a job key.  The job key provides a secure way to select the job ID carried in the transport.  A job key references a PD and is specified as part of any transfer.

A job key may be provided when creating a MR.  If so, the job *ID* is stored with the MR.  The PD of the job key and MR must be the same.

With absolute addressing, the QPN finds the QP through some table/hash/lookup.  An rkey locates a MR.  If the MR has a valid job ID associated with it, it's compared with the job ID from the transport.  If those match, the transfer is valid.  This check is in addition to verifying the QP and MR belong to the same PD.

With relative addressing, the job ID selects some table/hash, which identifies the QP.  Job matching is a natural part of mapping the QPN to the QP.  Job related checks against target MRs is the same as above.

There are other ways these checks may be implemented, including tighter restrictions on what MRs a QP may access.  But at least the above checks should hold.

Generalizing the above to remove UET addressing, a QP may either receive from any job or only those jobs that it is associated with.  A QP may belong to multiple jobs.  And a MR may be restricted to access by a single job.  Vendors may optimize their implementations around which features to support.  E.g. limit a QP to 1 job, no per job MRs, etc. 

- Sean