linux-kernel - Re: [RFC PATCH 00/28] Removing struct page from P2PDMA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190626065708.GB24531@lst.de>
Date:   Wed, 26 Jun 2019 08:57:08 +0200
From:   Christoph Hellwig <hch@....de>
To:     Logan Gunthorpe <logang@...tatee.com>
Cc:     Christoph Hellwig <hch@....de>, linux-kernel@...r.kernel.org,
        linux-block@...r.kernel.org, linux-nvme@...ts.infradead.org,
        linux-pci@...r.kernel.org, linux-rdma@...r.kernel.org,
        Jens Axboe <axboe@...nel.dk>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Sagi Grimberg <sagi@...mberg.me>,
        Keith Busch <kbusch@...nel.org>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Stephen Bates <sbates@...thlin.com>
Subject: Re: [RFC PATCH 00/28] Removing struct page from P2PDMA

On Tue, Jun 25, 2019 at 01:54:21PM -0600, Logan Gunthorpe wrote:
> Well whether it's dma_addr_t, phys_addr_t, pfn_t the result isn't all
> that different. You still need roughly the same 'if' hooks for any
> backed memory that isn't in the linear mapping and you can't get a
> kernel mapping for directly.
> 
> It wouldn't be too hard to do a similar patch set that uses something
> like phys_addr_t instead and have a request and queue flag for support
> of non-mappable memory. But you'll end up with very similar 'if' hooks
> and we'd have to clean up all bio-using drivers that access the struct
> pages directly.

We'll need to clean that mess up anyway, and I've been chugging
along doing some of that.  A lot still assume no highmem, so we need
to convert them over to something that kmaps anyway.  If we get
the abstraction right that will actually help converting over to
a better reprsentation.

> Though, we'd also still have the problem of how to recognize when the
> address points to P2PDMA and needs to be translated to the bus offset.
> The map-first inversion was what helped here because the driver
> submitting the requests had all the information. Though it could be
> another request flag and indicating non-mappable memory could be a flag
> group like REQ_NOMERGE_FLAGS -- REQ_NOMAP_FLAGS.

The assumes the request all has the same memory, which is a simplifing
assuption.  My idea was that if had our new bio_vec like this:

struct bio_vec {
	phys_addr_t		paddr; // 64-bit on 64-bit systems
	unsigned long		len;
};

we have a hole behind len where we could store flag.  Preferably
optionally based on a P2P or other magic memory types config
option so that 32-bit systems with 32-bit phys_addr_t actually
benefit from the smaller and better packing structure.

> If you think any of the above ideas sound workable I'd be happy to try
> to code up another prototype.

Іt sounds workable.  To some of the first steps are cleanups independent
of how the bio_vec is eventually going to look like.  That is making
the DMA-API internals work on the phys_addr_t, which also unifies the
map_resource implementation with map_page.  I plan to do that relatively
soon.  The next is sorting out access to bios data by virtual address.
All these need nice kmapping helper that avoid too much open coding.
I was going to look into that next, mostly to kill the block layer
bounce buffering code.  Similar things will also be needed at the
scatterlist level I think.  After that we need to more audits of
how bv_page is still used.  something like a bv_phys() helper that
does "page_to_phys(bv->bv_page) + bv->bv_offset" might come in handy
for example.