[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211028013934.GA267985@bhelgaas>
Date: Wed, 27 Oct 2021 20:39:34 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: Logan Gunthorpe <logang@...tatee.com>
Cc: Dongdong Liu <liudongdong3@...wei.com>, hch@...radead.org,
kw@...ux.com, leon@...nel.org, linux-pci@...r.kernel.org,
rajur@...lsio.com, hverkuil-cisco@...all.nl,
linux-media@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH V10 6/8] PCI/P2PDMA: Add a 10-Bit Tag check in P2PDMA
On Wed, Oct 27, 2021 at 05:41:07PM -0600, Logan Gunthorpe wrote:
> On 2021-10-27 5:11 p.m., Bjorn Helgaas wrote:
> >> @@ -532,6 +577,9 @@ calc_map_type_and_dist(struct pci_dev *provider, struct pci_dev *client,
> >> map_type = PCI_P2PDMA_MAP_NOT_SUPPORTED;
> >> }
> >> done:
> >> + if (pci_10bit_tags_unsupported(client, provider, verbose))
> >> + map_type = PCI_P2PDMA_MAP_NOT_SUPPORTED;
> >
> > I need to be convinced that this check is in the right spot to catch
> > all potential P2PDMA situations. The pci_p2pmem_find() and
> > pci_p2pdma_distance() interfaces eventually call
> > calc_map_type_and_dist(). But those interfaces don't actually produce
> > DMA bus addresses, and I'm not convinced that all P2PDMA users use
> > them.
> >
> > nvme *does* use them, but infiniband (rdma_rw_map_sg()) does not, and
> > it calls pci_p2pdma_map_sg().
>
> The rules of the current code is that calc_map_type_and_dist() must be
> called before pci_p2pdma_map_sg(). The calc function caches the mapping
> type in an xarray. If it was not called ahead of time,
> pci_p2pdma_map_type() will return PCI_P2PDMA_MAP_NOT_SUPPORTED, and the
> WARN_ON_ONCE will be hit in
> pci_p2pdma_map_sg_attrs().
Seems like it requires fairly deep analysis to prove all this. Is
this something we don't want to put directly in the map path because
it's a hot path, or it just doesn't fit there in the model, or ...?
> Both NVMe and RDMA (only used in the nvme fabrics code) do the correct
> thing here and we can be sure calc_map_type_and_dist() is called before
> any pages are mapped.
>
> The patch set I'm currently working on will ensure that
> calc_map_type_and_dist() is called before anyone maps a PCI P2PDMA page
> with dma_map_sg*().
>
> > amdgpu_dma_buf_attach() calls pci_p2pdma_distance_many() but I don't
> > know where it sets up P2PDMA transactions.
>
> The amdgpu driver hacked this in before proper support was done, but at
> least it's using pci_p2pdma_distance_many() presumably before trying any
> transfer. Though it's likely broken as it doesn't take into account the
> mapping type and thus I think it always assumes traffic goes through the
> host bridge (seeing it doesn't use pci_p2pdma_map_sg()).
What does it mean to go through the host bridge? Obviously DMA to
system memory would go through the host bridge, but this seems
different. Is this a "between PCI hierarchies" case like to a device
below a different root port? I don't know what the tag rules are for
that.
> > cxgb4 and qed mention "peer2peer", but I don't know whether they are
> > related; they don't seem to use any pci_p2p.* interfaces.
>
> I'm really not sure what these drivers are doing at all. However, I
> think this is unrelated based on this old patch description[1]:
>
> Open MPI, Intel MPI and other applications don't support the iWARP
> requirement that the client side send the first RDMA message. This
> class of application connection setup is called peer-2-peer. Typically
> once the connection is setup, _both_ sides want to send data.
>
> This patch enables supporting peer-2-peer over the chelsio rnic by
> enforcing this iWARP requirement in the driver itself as part of RDMA
> connection setup.
Thanks!
> Logan
>
> [1] http://lkml.iu.edu/hypermail/linux/kernel/0804.3/1416.html
Powered by blists - more mailing lists