lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <913df4b4-fc4a-409d-9007-088a3e2c8291@oracle.com>
Date: Mon, 31 Mar 2025 10:46:40 -0400
From: Chuck Lever <chuck.lever@...cle.com>
To: Jason Gunthorpe <jgg@...pe.ca>,
        Marek Szyprowski <m.szyprowski@...sung.com>
Cc: Leon Romanovsky <leon@...nel.org>, Robin Murphy <robin.murphy@....com>,
        Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
        Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
        Sagi Grimberg <sagi@...mberg.me>, Keith Busch <kbusch@...nel.org>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Logan Gunthorpe <logang@...tatee.com>,
        Yishai Hadas <yishaih@...dia.com>,
        Shameer Kolothum <shameerali.kolothum.thodi@...wei.com>,
        Kevin Tian <kevin.tian@...el.com>,
        Alex Williamson <alex.williamson@...hat.com>,
        Jérôme Glisse <jglisse@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Jonathan Corbet <corbet@....net>, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-block@...r.kernel.org,
        linux-rdma@...r.kernel.org, iommu@...ts.linux.dev,
        linux-nvme@...ts.infradead.org, linux-pci@...r.kernel.org,
        kvm@...r.kernel.org, linux-mm@...ck.org,
        Randy Dunlap <rdunlap@...radead.org>
Subject: Re: [PATCH v7 00/17] Provide a new two step DMA mapping API

On 3/21/25 8:41 PM, Jason Gunthorpe wrote:
> On Fri, Mar 21, 2025 at 12:52:30AM +0100, Marek Szyprowski wrote:
>>> Christoph's vision was to make a performance DMA API path that could
>>> be used to implement any scatterlist-like data structure very
>>> efficiently without having to teach the DMA API about all sorts of
>>> scatterlist-like things.
>>
>> Thanks for explaining one more motivation behind this patchset!
> 
> Sure, no problem.
> 
> To close the loop on the bigger picture here..
> 
> When you put the parts together:
> 
>  1) dma_map_sg is the only API that is both performant and fully
>     functional
> 
>  2) scatterlist is a horrible leaky design and badly misued all over
>     the place. When Logan added SG_DMA_BUS_ADDRESS it became quite
>     clear that any significant changes to scatterlist are infeasible,
>     or at least we'd break a huge number of untestable legacy drivers
>     in the process.
> 
>  3) We really want to do full featured performance DMA *without* a
>     struct page. This requires changing scatterlist, inventing a new
>     scatterlist v2 and DMA map for it, or this idea here of a flexible
>     lower level DMA API entry point.
> 
>     Matthew has been talking about struct-pageless for a long time now
>     from the block/mm direction using folio & memdesc and this is
>     meeting his work from the other end of the stack by starting to
>     build a way to do DMA on future struct pageless things. This is 
>     going to be huge multi-year project but small parts like this need
>     to be solved and agreed to make progress.
> 
>  4) In the immediate moment we still have problems in VFIO, RDMA, and
>     DRM managing P2P transfers because dma_map_resource/page() don't
>     properly work, and we don't have struct pages to use
>     dma_map_sg(). Hacks around the DMA API have been in the kernel for
>     a long time now, we want to see a properly architected solution.

The in-kernel NFS stack, for example, already has a mechanism for
receiving and sending RPC messages using arrays of bio_vecs. The stack
can use bio_vecs natively for communicating with both the page cache and
the kernel socket API.

But NFS's RPC/RDMA transport still has to convert these pages into a
scatterlist so that they can be mapped and then handed to the RDMA core.
Instead, having a DMA mapping API that can take an array of bio_vecs
directly (and then, a similar API within the RDMA core) would make
NFS/RDMA a lot more CPU-efficient.

The lack of a bio_vec DMA mapping API has held up a full conversion of
the in-kernel NFS stack to use folios. That's the reason I tried my
own hand at adding a bio_vec DMA mapping API last summer.

Leon and Christoph have provided a clean step in the right direction
and it looks to me like they have thought carefully about next steps.
Robin pointed out some areas that might be lacking in v7, but IMHO
there is a plan to address many of these areas in subsequent work. I
don't see a reason not to proceed with this first step.


-- 
Chuck Lever

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ