[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-Sb0Q-inlkdTopW@x130>
Date: Wed, 26 Mar 2025 17:29:05 -0700
From: Saeed Mahameed <saeedm@...dia.com>
To: Mina Almasry <almasrymina@...gle.com>
Cc: Toke Høiland-Jørgensen <toke@...hat.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Leon Romanovsky <leon@...nel.org>, Tariq Toukan <tariqt@...dia.com>,
Andrew Lunn <andrew+netdev@...n.ch>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>,
Simon Horman <horms@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Yonglong Liu <liuyonglong@...wei.com>,
Yunsheng Lin <linyunsheng@...wei.com>,
Pavel Begunkov <asml.silence@...il.com>,
Matthew Wilcox <willy@...radead.org>, netdev@...r.kernel.org,
bpf@...r.kernel.org, linux-rdma@...r.kernel.org, linux-mm@...ck.org,
Qiuling Ren <qren@...hat.com>, Yuying Ma <yuma@...hat.com>
Subject: Re: [PATCH net-next v2 3/3] page_pool: Track DMA-mapped pages and
unmap them when destroying the pool
On 26 Mar 13:02, Mina Almasry wrote:
>On Wed, Mar 26, 2025 at 11:22 AM Saeed Mahameed <saeedm@...dia.com> wrote:
>>
>> On 25 Mar 16:45, Toke Høiland-Jørgensen wrote:
>> >When enabling DMA mapping in page_pool, pages are kept DMA mapped until
>> >they are released from the pool, to avoid the overhead of re-mapping the
>> >pages every time they are used. This causes resource leaks and/or
>> >crashes when there are pages still outstanding while the device is torn
>> >down, because page_pool will attempt an unmap through a non-existent DMA
>> >device on the subsequent page return.
>> >
>>
>> Why dynamically track when it is guaranteed the page_pool consumer (driver)
>> will return all outstanding pages before disabling the DMA device.
>> When a page pool is destroyed by the driver, just mark it as "DMA-inactive",
>> and on page_pool_return_page() if DMA-inactive don't recycle those pages
>> and immediately DMA unmap and release them.
>
>That doesn't work, AFAIU. DMA unmaping after page_pool_destroy has
>been called in what's causing the very bug this series is trying to
>fix. What happens is:
>
>1. Driver calls page_pool_destroy,
Here the driver should already have returned all inflight pages to the
pool, the problem is that those returned pages are recycled back, instead
of being dma unmapped, all we need to do is to mark page_pool as "don't
recycle" after the driver destroyed it.
>2. Driver removes the net_device (and I guess the associated iommu
>structs go away with it).
if the driver had not yet released those pages at this point then there is a
more series leak than just dma leak. If the pool has those pages, which is
probably the case, then they were already release by the driver, the problem
is that they were recycled back.
>3. Page-pool tries to unmap after page_pool_destroy is called, trying
>to fetch iommu resources that have been freed due to the netdevice
>gone away = bad stuff.
>
>(but maybe I misunderstood your suggestion)
Yes, see above, but I just double checked the code and I though that the
page_pool_destroy would wait for all inflight pages to be release before it
returns back to the caller, but apparently I was mistaken, if the pages are
still being held by stack/user-space then they will still be considered
"inflight" although the driver is done with them :/.
So yes tracking "ALL" pages is one way to solve it, but I still think that
the correct way to deal with this is to hold the driver/netdev while there
are inflight pages, but no strong opinion if we expect pages to remain in
userspace for "too" long, then I admit, tracking is the only way.
>
>--
>Thanks,
>Mina
Powered by blists - more mailing lists