[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170406163520.GA7657@obsidianresearch.com>
Date: Thu, 6 Apr 2017 10:35:21 -0600
From: Jason Gunthorpe <jgunthorpe@...idianresearch.com>
To: Sagi Grimberg <sagi@...mberg.me>
Cc: Logan Gunthorpe <logang@...tatee.com>,
Christoph Hellwig <hch@....de>,
"James E.J. Bottomley" <jejb@...ux.vnet.ibm.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>,
Jens Axboe <axboe@...nel.dk>,
Steve Wise <swise@...ngridcomputing.com>,
Stephen Bates <sbates@...thlin.com>,
Max Gurtovoy <maxg@...lanox.com>,
Dan Williams <dan.j.williams@...el.com>,
Keith Busch <keith.busch@...el.com>, linux-pci@...r.kernel.org,
linux-scsi@...r.kernel.org, linux-nvme@...ts.infradead.org,
linux-rdma@...r.kernel.org, linux-nvdimm@...1.01.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC 6/8] nvmet: Be careful about using iomem accesses when
dealing with p2pmem
On Thu, Apr 06, 2017 at 08:33:38AM +0300, Sagi Grimberg wrote:
>
> >>Note that the nvme completion queues are still on the host memory, so
> >>this means we have lost the ordering between data and completions as
> >>they go to different pcie targets.
> >
> >Hmm, in this simple up/down case with a switch, I think it might
> >actually be OK.
> >
> >Transactions might not complete at the NVMe device before the CPU
> >processes the RDMA completion, however due to the PCI-E ordering rules
> >new TLPs directed to the NVMe will complete after the RMDA TLPs and
> >thus observe the new data. (eg order preserving)
> >
> >It would be very hard to use P2P if fabric ordering is not preserved..
>
> I think it still can race if the p2p device is connected with more than
> a single port to the switch.
>
> Say it's connected via 2 legs, the bar is accessed from leg A and the
> data from the disk comes via leg B. In this case, the data is heading
> towards the p2p device via leg B (might be congested), the completion
> goes directly to the RC, and then the host issues a read from the
> bar via leg A. I don't understand what can guarantee ordering here.
Right, this is why I qualified my statement with 'simple up/down case'
Make it any more complex and it clearly stops working sanely, but I
wouldn't worry about unusual PCI-E fabrics at this point..
> Stephen told me that this still guarantees ordering, but I honestly
> can't understand how, perhaps someone can explain to me in a simple
> way that I can understand.
AFAIK PCI-E ordering is explicitly per link, so things that need order
must always traverse the same link.
Jason
Powered by blists - more mailing lists