lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7fcc3ac8-8b96-90f5-3942-87f999c7499d@grimberg.me>
Date:   Mon, 10 Apr 2017 11:29:57 +0300
From:   Sagi Grimberg <sagi@...mberg.me>
To:     Stephen Bates <sbates@...thlin.com>,
        Jason Gunthorpe <jgunthorpe@...idianresearch.com>
Cc:     Logan Gunthorpe <logang@...tatee.com>,
        Christoph Hellwig <hch@....de>,
        "James E.J. Bottomley" <jejb@...ux.vnet.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        Jens Axboe <axboe@...nel.dk>,
        Steve Wise <swise@...ngridcomputing.com>,
        Max Gurtovoy <maxg@...lanox.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Keith Busch <keith.busch@...el.com>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
        "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
        "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
        "linux-nvdimm@...1.01.org" <linux-nvdimm@...1.01.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC 6/8] nvmet: Be careful about using iomem accesses when
 dealing with p2pmem


> Sagi
>
> As long as legA, legB and the RC are all connected to the same switch then ordering will be preserved (I think many other topologies also work). Here is how it would work for the problem case you are concerned about (which is a read from the NVMe drive).
>
> 1. Disk device DMAs out the data to the p2pmem device via a string of PCIe MemWr TLPs.
> 2. Disk device writes to the completion queue (in system memory) via a MemWr TLP.
> 3. The last of the MemWrs from step 1 might have got stalled in the PCIe switch due to congestion but if so they are stalled in the egress path of the switch for the p2pmem port.
> 4. The RC determines the IO is complete when the TLP associated with step 2 updates the memory associated with the CQ. It issues some operation to read the p2pmem.
> 5. Regardless of whether the MemRd TLP comes from the RC or another device connected to the switch it is queued in the egress queue for the p2pmem FIO behind the last DMA TLP (from step 1).
> PCIe ordering ensures that this MemRd cannot overtake the MemWr (Reads can never pass writes).
> Therefore the MemRd can never get to the p2pmem device until after the last DMA MemWr has.

What you are saying is surprising to me. The switch needs to preserve
ordering across different switch ports ??

You are suggesting that there is a *switch-wide* state that tracks
MemRds never pass MemWrs across all the switch ports? That is a very
non-trivial statement...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ