linux-kernel - Re: [PATCH RFC v4 2/3] page_pool: fix IOMMU crash when driver has already unbound

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6233e2c3-3fea-4ed0-bdcc-9a625270da37@huawei.com>
Date: Tue, 26 Nov 2024 16:22:30 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Jesper Dangaard Brouer <hawk@...nel.org>, <davem@...emloft.net>,
	<kuba@...nel.org>, <pabeni@...hat.com>
CC: <liuyonglong@...wei.com>, <fanghaiqing@...wei.com>,
	<zhangkun09@...wei.com>, Robin Murphy <robin.murphy@....com>, Alexander Duyck
	<alexander.duyck@...il.com>, IOMMU <iommu@...ts.linux.dev>, Ilias Apalodimas
	<ilias.apalodimas@...aro.org>, Eric Dumazet <edumazet@...gle.com>, Simon
 Horman <horms@...nel.org>, <netdev@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RFC v4 2/3] page_pool: fix IOMMU crash when driver has
 already unbound

On 2024/11/25 23:25, Jesper Dangaard Brouer wrote:

...

>>>> +
>>>>    void page_pool_destroy(struct page_pool *pool)
>>>>    {
>>>>        if (!pool)
>>>> @@ -1139,6 +1206,8 @@ void page_pool_destroy(struct page_pool *pool)
>>>>         */
>>>>        synchronize_rcu();
>>>>    +    page_pool_inflight_unmap(pool);
>>>> +
>>>
>>> Reaching here means we have detected in-flight packets/pages.
>>>
>>> In "page_pool_inflight_unmap" we scan and find those in-flight pages to
>>> DMA unmap them. Then below we wait for these in-flight pages again.
>>> Why don't we just "release" (page_pool_release_page) those in-flight
>>> pages from belonging to the page_pool, when we found them during scanning?
>>>
>>> If doing so, we can hopefully remove the periodic checking code below.
>>
>> I thought about that too, but it means more complicated work than just
>> calling the page_pool_release_page() as page->pp_ref_count need to be
>> converted into page->_refcount for the above to work, it seems hard to
>> do that with least performance degradation as the racing against
>> page_pool_put_page() being called concurrently.
>>
> 
> Maybe we can have a design that avoid/reduce concurrency.  Can we
> convert the suggested pool->destroy_lock into an atomic?
> (Doing an *atomic* READ in page_pool_return_page, should be fast if we
> keep this cache in in (cache coherence) Shared state).
> 
> In your new/proposed page_pool_return_page() when we see the
> "destroy_cnt" (now atomic READ) bigger than zero, then we can do nothing
> (or maybe we need decrement page-refcnt?), as we know the destroy code

Is it valid to have a page->_refcount of zero when page_pool still own
the page if we only decrement page->_refcount and not clear page->pp_magic?
What happens if put_page() is called from other subsystem for a page_pool
owned page, isn't that mean the page might be returned to buddy page
allocator, causing use-after-free problem?

> will be taking care of "releasing" the pages from the page pool.

If page->_refcount is not decremented in page_pool_return_page(), how
does page_pool_destroy() know if a specific page have been called with
page_pool_return_page()? Does an extra state is needed to indicate that?

And there might still be concurrency between checking/handling of the extra
state in page_pool_destroy() and the setting of extra state in
page_pool_return_page(), something like lock might still be needed to avoid
the above concurrency.

> 
> Once the a page is release from a page pool it becomes a normal page,
> that adhere to normal page refcnt'ing. That is how it worked before with
> page_pool_release_page().
> The later extensions with page fragment support and devmem might have
> complicated this code path.

As page_pool_return_page() and page_pool_destroy() both try to "release"
the page concurrently for a specific page, I am not sure how using some
simple *atomic* can avoid this kind of concurrency even before page
fragment and devmem are supported, it would be good to be more specific
about that by using some pseudocode.

I looked at it more closely, previously page_pool_put_page() seemed to
not be allowed to be called after page_pool_release_page() had been
called for a specific page mainly because of concurrently checking/handlig
and clearing of page->pp_magic if I understand it correctly:
https://elixir.bootlin.com/linux/v5.16.20/source/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c#L5316