[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190626065708.GB24531@lst.de>
Date: Wed, 26 Jun 2019 08:57:08 +0200
From: Christoph Hellwig <hch@....de>
To: Logan Gunthorpe <logang@...tatee.com>
Cc: Christoph Hellwig <hch@....de>, linux-kernel@...r.kernel.org,
linux-block@...r.kernel.org, linux-nvme@...ts.infradead.org,
linux-pci@...r.kernel.org, linux-rdma@...r.kernel.org,
Jens Axboe <axboe@...nel.dk>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Dan Williams <dan.j.williams@...el.com>,
Sagi Grimberg <sagi@...mberg.me>,
Keith Busch <kbusch@...nel.org>,
Jason Gunthorpe <jgg@...pe.ca>,
Stephen Bates <sbates@...thlin.com>
Subject: Re: [RFC PATCH 00/28] Removing struct page from P2PDMA
On Tue, Jun 25, 2019 at 01:54:21PM -0600, Logan Gunthorpe wrote:
> Well whether it's dma_addr_t, phys_addr_t, pfn_t the result isn't all
> that different. You still need roughly the same 'if' hooks for any
> backed memory that isn't in the linear mapping and you can't get a
> kernel mapping for directly.
>
> It wouldn't be too hard to do a similar patch set that uses something
> like phys_addr_t instead and have a request and queue flag for support
> of non-mappable memory. But you'll end up with very similar 'if' hooks
> and we'd have to clean up all bio-using drivers that access the struct
> pages directly.
We'll need to clean that mess up anyway, and I've been chugging
along doing some of that. A lot still assume no highmem, so we need
to convert them over to something that kmaps anyway. If we get
the abstraction right that will actually help converting over to
a better reprsentation.
> Though, we'd also still have the problem of how to recognize when the
> address points to P2PDMA and needs to be translated to the bus offset.
> The map-first inversion was what helped here because the driver
> submitting the requests had all the information. Though it could be
> another request flag and indicating non-mappable memory could be a flag
> group like REQ_NOMERGE_FLAGS -- REQ_NOMAP_FLAGS.
The assumes the request all has the same memory, which is a simplifing
assuption. My idea was that if had our new bio_vec like this:
struct bio_vec {
phys_addr_t paddr; // 64-bit on 64-bit systems
unsigned long len;
};
we have a hole behind len where we could store flag. Preferably
optionally based on a P2P or other magic memory types config
option so that 32-bit systems with 32-bit phys_addr_t actually
benefit from the smaller and better packing structure.
> If you think any of the above ideas sound workable I'd be happy to try
> to code up another prototype.
Іt sounds workable. To some of the first steps are cleanups independent
of how the bio_vec is eventually going to look like. That is making
the DMA-API internals work on the phys_addr_t, which also unifies the
map_resource implementation with map_page. I plan to do that relatively
soon. The next is sorting out access to bios data by virtual address.
All these need nice kmapping helper that avoid too much open coding.
I was going to look into that next, mostly to kill the block layer
bounce buffering code. Similar things will also be needed at the
scatterlist level I think. After that we need to more audits of
how bv_page is still used. something like a bv_phys() helper that
does "page_to_phys(bv->bv_page) + bv->bv_offset" might come in handy
for example.
Powered by blists - more mailing lists