lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <91964676-522f-44b5-87bc-27cfa5193813@huawei.com>
Date: Fri, 22 Nov 2024 15:20:33 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Robin Murphy <robin.murphy@....com>, <davem@...emloft.net>,
	<kuba@...nel.org>, <pabeni@...hat.com>
CC: <liuyonglong@...wei.com>, <fanghaiqing@...wei.com>,
	<zhangkun09@...wei.com>, Alexander Duyck <alexander.duyck@...il.com>, Andrew
 Morton <akpm@...ux-foundation.org>, IOMMU <iommu@...ts.linux.dev>, MM
	<linux-mm@...ck.org>, Jesper Dangaard Brouer <hawk@...nel.org>, Ilias
 Apalodimas <ilias.apalodimas@...aro.org>, Eric Dumazet <edumazet@...gle.com>,
	Simon Horman <horms@...nel.org>, <netdev@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RFC v4 3/3] page_pool: skip dma sync operation for
 inflight pages

On 2024/11/21 21:44, Robin Murphy wrote:
> On 21/11/2024 8:04 am, Yunsheng Lin wrote:
>> On 2024/11/21 0:17, Robin Murphy wrote:
>>> On 20/11/2024 10:34 am, Yunsheng Lin wrote:
>>>> Skip dma sync operation for inflight pages before the
>>>> page_pool_destroy() returns to the driver as DMA API
>>>> expects to be called with a valid device bound to a
>>>> driver as mentioned in [1].
>>>>
>>>> After page_pool_destroy() is called, the page is not
>>>> expected to be recycled back to pool->alloc cache and
>>>> dma sync operation is not needed when the page is not
>>>> recyclable or pool->ring is full, so only skip the dma
>>>> sync operation for the infilght pages by clearing the
>>>> pool->dma_sync under protection of rcu lock when page
>>>> is recycled to pool->ring to ensure that there is no
>>>> dma sync operation called after page_pool_destroy() is
>>>> returned.
>>>
>>> Something feels off here - either this is a micro-optimisation which I wouldn't really expect to be meaningful, or it means patch #2 doesn't actually do what it claims. If it really is possible to attempt to dma_sync a page *after* page_pool_inflight_unmap() has already reclaimed and unmapped it, that represents yet another DMA API lifecycle issue, which as well as being even more obviously incorrect usage-wise, could also still lead to the same crash (if the device is non-coherent).
>>
>> For a page_pool owned page, it mostly goes through the below steps:
>> 1. page_pool calls buddy allocator API to allocate a page, call DMA mapping
>>     and sync_for_device API for it if its pool is empty. Or reuse the page in
>>     pool.
>>
>> 2. Driver calls the page_pool API to allocate the page, and pass the page
>>     to network stack after packet is dma'ed into the page and the sync_for_cpu
>>     API is called.
>>
>> 3. Network stack is done with page and called page_pool API to free the page.
>>
>> 4. page_pool releases the page back to buddy allocator if the page is not
>>     recyclable before doing the dma unmaping. Or do the sync_for_device
>>     and put the page in the its pool, the page might go through step 1
>>     again if the driver calls the page_pool allocate API.
>>
>> The calling of dma mapping and dma sync API is controlled by pool->dma_map
>> and pool->dma_sync respectively, the previous patch only clear pool->dma_map
>> after doing the dma unmapping. This patch ensures that there is no dma_sync
>> for recycle case of step 4 by clearing pool->dma_sync.
> 
> But *why* does it want to ensure that? Is there some possible race where one thread can attempt to sync and recycle a page while another thread is attempting to unmap and free it, such that you can't guarantee the correctness of dma_sync calls after page_pool_inflight_unmap() has started, and skipping them is a workaround for that? If so, then frankly I think that would want solving properly, but at the very least this change would need to come before patch #2.

The racing window is something like below. page_pool_destroy() and
page_pool_put_page() can be called concurrently, patch 2 only use
a spinlock to synchronise page_pool_inflight_unmap() with
page_pool_return_page() called by page_pool_put_page() to avoid
concurrent dma unmapping, there is no synchronization between
page_pool_destroy() and page_pool_dma_sync_for_device() called
by page_pool_put_page():
            CPU0                           CPU1
             .                               .
     page_pool_destroy()          page_pool_put_page()
             .                               .
     synchronize_rcu()                       .
             .                               .
   page_pool_inflight_unmap()                .
             .                               .
             .                    __page_pool_put_page()
             .                               .
             .               page_pool_dma_sync_for_device()
             .                               .

After this patch, page_pool_dma_sync_for_device() is protected by
rcu lock and pool->dma_sync is cleared before synchronize_rcu and
page_pool_inflight_unmap() is called after synchronize_rcu to ensure
page_pool_dma_sync_for_device() will not call dma sync API after
synchronize_rcu():

            CPU0                           CPU1
             .                               .
     page_pool_destroy() CPU       page_pool_put_page() CPU
             .                               .
    pool->dma_sync = false                   .
             .                               .
     synchronize_rcu()                       .
             .                               .
   page_pool_inflight_unmap()                .
             .                               .
             .                  page_pool_recycle_in_ring()
             .                               .
             .                        rcu_read_lock()
             .               page_pool_dma_sync_for_device()
             .                        rcu_read_unlock()

Previously patch 2&3 was combined as one patch, this version splits
it out to make it more reviewable.
I am not sure if it matters that much about the patch order as the
fix doesn't seem to be completed unless both patches are included.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ