[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAC_iWjL4YfCOffAZPUun5wggxrqAanjd+8SgmJQN0yyWsvb3sg@mail.gmail.com>
Date: Fri, 18 Aug 2023 09:12:09 +0300
From: Ilias Apalodimas <ilias.apalodimas@...aro.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Mina Almasry <almasrymina@...gle.com>, Yunsheng Lin <linyunsheng@...wei.com>, davem@...emloft.net,
pabeni@...hat.com, netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
Lorenzo Bianconi <lorenzo@...nel.org>, Alexander Duyck <alexander.duyck@...il.com>,
Liang Chen <liangchen.linux@...il.com>,
Alexander Lobakin <aleksander.lobakin@...el.com>, Saeed Mahameed <saeedm@...dia.com>,
Leon Romanovsky <leon@...nel.org>, Eric Dumazet <edumazet@...gle.com>,
Jesper Dangaard Brouer <hawk@...nel.org>
Subject: Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit
arch with 64-bit DMA
On Fri, 18 Aug 2023 at 02:57, Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Thu, 17 Aug 2023 19:59:37 +0300 Ilias Apalodimas wrote:
> > > Can we assume the DMA mapping of page pool is page aligned? We should
> > > be, right?
> >
> > Yes
> >
> > > That means we're storing 12 bits of 0 at the lower end.
> > > So even with 32b of space we can easily store addresses for 32b+12b =>
> > > 16TB of memory. "Ought to be enough" to paraphrase Bill G, and the
> > > problem is only in our heads?
> >
> > Do you mean moving the pp_frag_count there?
>
> Right, IIUC we don't have enough space to fit dma_addr_t and the
> refcount, but if we store the dma addr on a shifted u32 instead
> of using dma_addr_t explicitly - the refcount should fit?
struct page looks like this:
unsigned long dma_addr;
union {
unsigned long dma_addr_upper;
atomic_long_t pp_frag_count;
};
So, on 32bit platforms with 64bit dma we can't support a frag count at all.
We could either use the lower 12 bits (and have support for 4096 frags
'only') or do what you suggest.
TBH I don't love any of these and since those platforms are rare (or
at least that's what I think), I prefer not supporting them at all.
>
> > I was questioning the need to have PP_FLAG_PAGE_SPLIT_IN_DRIVER
> > overall. With Yunshengs patches such a platform would allocate a
> > page, so why should we prevent it from splitting it internally?
>
> Splitting it is fine, the problem is that the refcount AKA
> page->pp_frag_count** counts outstanding PP-aware references
> and page->refcount counts PP-unaware references.
>
> If we want to use page->refcount directly we'd need to unmap
> the page whenever drivers calls page_pool_defrag_page().
> But the driver may assume the page is still mapped afterwards.
What I am suggesting here is to not add the new
PP_FLAG_PAGE_SPLIT_IN_DRIVER flag. If a driver wants to split pages
internally it should create a pool without
PP_FLAG_DMA_SYNC_DEV to begin with. The only responsibility the
driver would have is to elevate the page refcnt so page pool would not
try to free/recycle it. Since it won't be able to allocate fragments
we don't have to worry about the rest.
> We can change the API to make this behavior explicit. Although
> IMHO that's putting the burden of rare platforms on non-rare
> platforms which we should avoid.
Yep, agree here.
>
> ** I said it before and I will keep saying this until someone gets
> angry at me - I really think we should rename this field because
> the association with frags is a coincidence.
I had similar confusions everytime I had to re-read our code hence git
show 4d4266e3fd32
Any suggestions?
Thanks
/Ilias
Powered by blists - more mailing lists