lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b24c6f711d2e23792d6577a4ca508d75b0af4d9e.camel@wdc.com>
Date:   Fri, 26 Apr 2019 20:28:32 +0000
From:   Adam Manzanares <Adam.Manzanares@....com>
To:     "jglisse@...hat.com" <jglisse@...hat.com>,
        "lsf-pc@...ts.linux-foundation.org" 
        <lsf-pc@...ts.linux-foundation.org>
CC:     "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>
Subject: Re: [LSF/MM TOPIC] Direct block mapping through fs for device

On Thu, 2019-04-25 at 21:38 -0400, Jerome Glisse wrote:
> I see that they are still empty spot in LSF/MM schedule so i would
> like to
> have a discussion on allowing direct block mapping of file for
> devices (nic,
> gpu, fpga, ...). This is mm, fs and block discussion, thought the mm
> side
> is pretty light ie only adding 2 callback to vm_operations_struct:
> 
>     int (*device_map)(struct vm_area_struct *vma,
>                       struct device *importer,
>                       struct dma_buf **bufp,
>                       unsigned long start,
>                       unsigned long end,
>                       unsigned flags,
>                       dma_addr_t *pa);
> 
>     // Some flags i can think of:
>     DEVICE_MAP_FLAG_PIN // ie return a dma_buf object
>     DEVICE_MAP_FLAG_WRITE // importer want to be able to write
>     DEVICE_MAP_FLAG_SUPPORT_ATOMIC_OP // importer want to do atomic
> operation
>                                       // on the mapping
> 
>     void (*device_unmap)(struct vm_area_struct *vma,
>                          struct device *importer,
>                          unsigned long start,
>                          unsigned long end,
>                          dma_addr_t *pa);
> 
> Each filesystem could add this callback and decide wether or not to
> allow
> the importer to directly map block. Filesystem can use what ever
> logic they
> want to make that decision. For instance if they are page in the page
> cache
> for the range then it can say no and the device would fallback to
> main
> memory. Filesystem can also update its internal data structure to
> keep
> track of direct block mapping.
> 
> If filesystem decide to allow the direct block mapping then it
> forward the
> request to the block device which itself can decide to forbid the
> direct
> mapping again for any reasons. For instance running out of BAR space
> or
> peer to peer between block device and importer device is not
> supported or
> block device does not want to allow writeable peer mapping ...
> 
> 
> So event flow is:
>     1  program mmap a file (end never intend to access it with CPU)
>     2  program try to access the mmap from a device A
>     3  device A driver see device_map callback on the vma and call it
>     4a on success device A driver program the device to mapped dma
> address
>     4b on failure device A driver fallback to faulting so that it can
> use
>        page from page cache
> 
> This API assume that the importer does support mmu notifier and thus
> that
> the fs can invalidate device mapping at _any_ time by sending mmu
> notifier
> to all mapping of the file (for a given range in the file or for the
> whole
> file). Obviously you want to minimize disruption and thus only
> invalidate
> when necessary.
> 
> The dma_buf parameter can be use to add pinning support for
> filesystem who
> wish to support that case too. Here the mapping lifetime get
> disconnected
> from the vma and is transfer to the dma_buf allocated by filesystem.
> Again
> filesystem can decide to say no as pinning blocks has drastic
> consequence
> for filesystem and block device.
> 
> 
> This has some similarities to the hmmap and caching topic (which is
> mapping
> block directly to CPU AFAIU) but device mapping can cut some corner
> for
> instance some device can forgo atomic operation on such mapping and
> thus
> can work over PCIE while CPU can not do atomic to PCIE BAR.
> 
> Also this API here can be use to allow peer to peer access between
> devices
> when the vma is a mmap of a device file and thus vm_operations_struct
> come
> from some exporter device driver. So same 2 vm_operations_struct call
> back
> can be use in more cases than what i just described here.
> 
> 
> So i would like to gather people feedback on general approach and few
> things
> like:
>     - Do block device need to be able to invalidate such mapping too
> ?
> 
>       It is easy for fs the to invalidate as it can walk file
> mappings
>       but block device do not know about file.
> 
>     - Do we want to provide some generic implementation to share
> accross
>       fs ?
> 
>     - Maybe some share helpers for block devices that could track
> file
>       corresponding to peer mapping ?

I'm interested in being a part of this discussion.

> 
> 
> Cheers,
> Jérôme

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ