[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1143687.1750755725@warthog.procyon.org.uk>
Date: Tue, 24 Jun 2025 10:02:05 +0100
From: David Howells <dhowells@...hat.com>
To: Christoph Hellwig <hch@...radead.org>
Cc: dhowells@...hat.com, Andrew Lunn <andrew@...n.ch>,
Eric Dumazet <edumazet@...gle.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
David Hildenbrand <david@...hat.com>,
John Hubbard <jhubbard@...dia.com>,
Mina Almasry <almasrymina@...gle.com>, willy@...radead.org,
Christian Brauner <brauner@...nel.org>,
Al Viro <viro@...iv.linux.org.uk>, netdev@...r.kernel.org,
linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, Leon Romanovsky <leon@...nel.org>,
Logan Gunthorpe <logang@...tatee.com>,
Jason Gunthorpe <jgg@...dia.com>
Subject: Re: How to handle P2P DMA with only {physaddr,len} in bio_vec?
Christoph Hellwig <hch@...radead.org> wrote:
> On Mon, Jun 23, 2025 at 11:50:58AM +0100, David Howells wrote:
> > What's the best way to manage this without having to go back to the page
> > struct for every DMA mapping we want to make?
>
> There isn't a very easy way. Also because if you actually need to do
> peer to peer transfers, you right now absolutely need the page to find
> the pgmap that has the information on how to perform the peer to peer
> transfer.
Are you expecting P2P to become particularly common? Because page struct
lookups will become more expensive because we'll have to do type checking and
Willy may eventually move them from a fixed array into a maple tree - so if we
can record the P2P flag in the bio_vec, it would help speed up the "not P2P"
case.
> > Do we need to have
> > iov_extract_user_pages() note this in the bio_vec?
> >
> > struct bio_vec {
> > physaddr_t bv_base_addr; /* 64-bits */
> > size_t bv_len:56; /* Maybe just u32 */
> > bool p2pdma:1; /* Region is involved in P2P */
> > unsigned int spare:7;
> > };
>
> Having a flag in the bio_vec might be a way to shortcut the P2P or not
> decision a bit. The downside is that without the flag, the bio_vec
> in the brave new page-less world would actually just be:
>
> struct bio_vec {
> phys_addr_t bv_phys;
> u32 bv_len;
> } __packed;
>
> i.e. adding any more information would actually increase the size from
> 12 bytes to 16 bytes for the usualy 64-bit phys_addr_t setups, and thus
> undo all the memory savings that this move would provide.
Do we actually need 32 bits for bv_len, especially given that MAX_RW_COUNT is
capped at a bit less than 2GiB? Could we, say, do:
struct bio_vec {
phys_addr_t bv_phys;
u32 bv_len:31;
u32 bv_use_p2p:1;
} __packed;
And rather than storing the how-to-do-P2P info in the page struct, does it
make sense to hold it separately, keyed on bv_phys?
Also, is it possible for the networking stack, say, to trivially map the P2P
memory in order to checksum it? I presume bv_phys in that case would point to
a mapping of device memory?
Thanks,
David
Powered by blists - more mailing lists