[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251227192303.3866551-1-zhipingz@meta.com>
Date: Sat, 27 Dec 2025 11:22:54 -0800
From: Zhiping Zhang <zhipingz@...a.com>
To: Jason Gunthorpe <jgg@...pe.ca>
CC: Leon Romanovsky <leon@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
<linux-rdma@...r.kernel.org>, <linux-pci@...r.kernel.org>,
<netdev@...r.kernel.org>, Keith Busch <kbusch@...nel.org>,
Yochai Cohen
<yochai@...dia.com>, Yishai Hadas <yishaih@...dia.com>,
Zhiping Zhang
<zhipingz@...a.com>
Subject: Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access
On Thur 2025-12-04 8:10 UTC Zhiping Zhang wrote:
> On Monday 2025-11-20 13:11 UTC, Jason Gunthorpe wrote:
> >
> > Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access
> >
> > On Wed, Nov 19, 2025 at 11:24:40PM -0800, Zhiping Zhang wrote:
> > > On Monday, November 17, 2025 at 8:00 AM, Jason Gunthorpe wrote:
> > > > Re: [RFC 2/2] Set steering-tag directly for PCIe P2P memory access
> > > >
> > > > On Thu, Nov 13, 2025 at 01:37:12PM -0800, Zhiping Zhang wrote:
> > > > > RDMA: Set steering-tag value directly in DMAH struct for DMABUF MR
> > > > >
> > > > > This patch enables construction of a dma handler (DMAH) with the P2P memory type
> > > > > and a direct steering-tag value. It can be used to register a RDMA memory
> > > > > region with DMABUF for the RDMA NIC to access the other device's memory via P2P.
> > > > >
> > > > > Signed-off-by: Zhiping Zhang <zhipingz@...a.com>
> > > > > ---
> > > > > .../infiniband/core/uverbs_std_types_dmah.c | 28 +++++++++++++++++++
> > > > > drivers/infiniband/core/uverbs_std_types_mr.c | 3 ++
> > > > > drivers/infiniband/hw/mlx5/dmah.c | 5 ++--
> > > > > .../net/ethernet/mellanox/mlx5/core/lib/st.c | 12 +++++---
> > > > > include/linux/mlx5/driver.h | 4 +--
> > > > > include/rdma/ib_verbs.h | 2 ++
> > > > > include/uapi/rdma/ib_user_ioctl_cmds.h | 1 +
> > > > > 7 files changed, 46 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/drivers/infiniband/core/uverbs_std_types_dmah.c b/drivers/infiniband/core/uverbs_std_types_dmah.c
> > > > > index 453ce656c6f2..1ef400f96965 100644
> > > > > --- a/drivers/infiniband/core/uverbs_std_types_dmah.c
> > > > > +++ b/drivers/infiniband/core/uverbs_std_types_dmah.c
> > > > > @@ -61,6 +61,27 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DMAH_ALLOC)(
> > > > > dmah->valid_fields |= BIT(IB_DMAH_MEM_TYPE_EXISTS);
> > > > > }
> > > > >
> > > > > + if (uverbs_attr_is_valid(attrs, UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL)) {
> > > > > + ret = uverbs_copy_from(&dmah->direct_st_val, attrs,
> > > > > + UVERBS_ATTR_ALLOC_DMAH_DIRECT_ST_VAL);
> > > > > + if (ret)
> > > > > + goto err;
> > > >
> > > > This should not come from userspace, the dmabuf exporter should
> > > > provide any TPH hints as part of the attachment process.
> > > >
> > > > We are trying not to allow userspace raw access to the TPH values, so
> > > > this is not a desirable UAPI here.
> > > >
> > > > Thanks for your feedback!
> > >
> > > I understand the concern about not exposing raw TPH values to
> > > userspace. To clarify, would it be acceptable to use an index-based
> > > mapping table, where userspace provides an index and the kernel
> > > translates it to the appropriate TPH value? Given that the PCIe spec
> > > allows up to 16-bit TPH values, this could require a mapping table
> > > of up to 128KB. Do you see this as a reasonable approach, or is
> > > there a preferred alternative?
> >
> > ?
> >
> > The issue here is to secure the TPH. The kernel driver that owns the
> > exporting device should control what TPH values an importing driver
> > will use.
> >
> > I don't see how an indirection table helps anything, you need to add
> > an API to DMABUF to retrieve the tph.
> I see, thanks for the clarification. Yes we can add and use another new
> API(s) for this purpose.
> Sorry for the delay: I was waiting for the final version of Leon's
> vfio-dmabuf patch series and plan to follow that for implementing the new
> API(s) needed.
> (https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-6-d7f71607f371@nvidia.com/).
>
> >
> > > Additionally, in cases where the dmabuf exporter device can handle all possible 16-bit
> > > TPH values (i.e., it has its own internal mapping logic or table), should this still be
> > > entirely abstracted away from userspace?
> >
> > I imagine the exporting device provides the raw on the wire TPH value
> > it wants the importing device to use and the importing device is
> > responsible to program it using whatever scheme it has.
> >
> > Jason
>
> Can you suggest or elaborate a bit on the schmes you see feasible?
>
> When the exporting device supports all or multiple TPH values, it is
> desirable to have userspace processes select which TPH values to use
> for the dmabuf at runtime. Actually that is the main use case of this
> patch: the user can select the TPH values to associate desired P2P
> operations on the dmabuf. The difficulty is how we can provide this
> flexibility while still aligning with kernel and security best
> practices.
>
> Zhiping
Happy holidays! I went through the vfio-dmabuf patch series and Jason's
comments once more. I think I have a proposal that addresses the concerns.
For p2p or dmabuf use cases, we pass in an ID or fd similar to CPU_ID when
allocating a dmah, and make a callback to the dmabuf exporter to get the
TPH value associated with the fd. That involves adding a new dmabuf operation
for the callback to get the TPH/tag value associated.
I can start with vfio-dmabuf and add the new dmabuf op/ABI there based on
Leon's patch. Pls let me know if you have any concerns or suggestions.
Zhiping
Powered by blists - more mailing lists