lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4iqnz1B00Q3xG-nGrLXdOyB7ditxmwZyotksLFgUqr+jA@mail.gmail.com>
Date:   Sun, 16 Apr 2017 08:53:45 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     Benjamin Herrenschmidt <benh@...nel.crashing.org>
Cc:     Logan Gunthorpe <logang@...tatee.com>,
        Bjorn Helgaas <helgaas@...nel.org>,
        Jason Gunthorpe <jgunthorpe@...idianresearch.com>,
        Christoph Hellwig <hch@....de>,
        Sagi Grimberg <sagi@...mberg.me>,
        "James E.J. Bottomley" <jejb@...ux.vnet.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        Jens Axboe <axboe@...nel.dk>,
        Steve Wise <swise@...ngridcomputing.com>,
        Stephen Bates <sbates@...thlin.com>,
        Max Gurtovoy <maxg@...lanox.com>,
        Keith Busch <keith.busch@...el.com>, linux-pci@...r.kernel.org,
        linux-scsi <linux-scsi@...r.kernel.org>,
        linux-nvme@...ts.infradead.org, linux-rdma@...r.kernel.org,
        linux-nvdimm <linux-nvdimm@...1.01.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Jerome Glisse <jglisse@...hat.com>
Subject: Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

On Sat, Apr 15, 2017 at 8:01 PM, Benjamin Herrenschmidt
<benh@...nel.crashing.org> wrote:
> On Sat, 2017-04-15 at 15:09 -0700, Dan Williams wrote:
>> I'm wondering, since this is limited to support behind a single
>> switch, if you could have a software-iommu hanging off that switch
>> device object that knows how to catch and translate the non-zero
>> offset bus address case. We have something like this with VMD driver,
>> and I toyed with a soft pci bridge when trying to support AHCI+NVME
>> bar remapping. When the dma api looks up the iommu for its device it
>> hits this soft-iommu and that driver checks if the page is host memory
>> or device memory to do the dma translation. You wouldn't need a bit in
>> struct page, just a lookup to the hosting struct dev_pagemap in the
>> is_zone_device_page() case and that can point you to p2p details.
>
> I was thinking about a hook in the arch DMA ops but that kind of
> wrapper might work instead indeed. However I'm not sure what's the best
> way to "instantiate" it.
>
> The main issue is that the DMA ops are a function of the initiator,
> not the target (since the target is supposed to be memory) so things
> are a bit awkward.
>
> One (user ?) would have to know that a given device "intends" to DMA
> directly to another device.
>
> This is awkward because in the ideal scenario, this isn't something the
> device knows. For example, one could want to have an existing NIC DMA
> directly to/from NVME pages or GPU pages.
>
> The NIC itself doesn't know the characteristic of these pages, but
> *something* needs to insert itself in the DMA ops of that bridge to
> make it possible.
>
> That's why I wonder if it's the struct page of the target that should
> be "marked" in such a way that the arch dma'ops can immediately catch
> that they belong to a device and might require "wrapped" operations.
>
> Are ZONE_DEVICE pages identifiable based on the struct page alone ? (a
> flag ?)

Yes, is_zone_device_page(). However I think we're getting to the point
with pmem, hmm, cdm, and now p2p where ZONE_DEVICE is losing specific
meaning and we need to have explicit type checks like is_hmm_page()
is_p2p_page() that internally check is_zone_device_page() plus some
other specific type.

> That would allow us to keep a fast path for normal memory targets, but
> also have some kind of way to handle the special cases of such peer 2
> peer (or also handle other type of peer to peer that don't necessarily
> involve PCI address wrangling but could require additional iommu bits).
>
> Just thinking out loud ... I don't have a firm idea or a design. But
> peer to peer is definitely a problem we need to tackle generically, the
> demand for it keeps coming up.

ZONE_DEVICE allows you to redirect via get_dev_pagemap() to retrieve
context about the physical address in question. I'm thinking you can
hang bus address translation data off of that structure. This seems
vaguely similar to what HMM is doing.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ