netdev - Re: [RFC PATCH net-next] page_pool: Track DMA-mapped pages and unmap them when destroying the pool

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2c363f6a-f9e4-4dd2-941d-db446c501885@huawei.com>
Date: Mon, 10 Mar 2025 20:35:31 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Toke Høiland-Jørgensen <toke@...hat.com>, Yunsheng
 Lin <yunshenglin0825@...il.com>, Andrew Morton <akpm@...ux-foundation.org>,
	Jesper Dangaard Brouer <hawk@...nel.org>, Ilias Apalodimas
	<ilias.apalodimas@...aro.org>, "David S. Miller" <davem@...emloft.net>
CC: Yonglong Liu <liuyonglong@...wei.com>, Mina Almasry
	<almasrymina@...gle.com>, Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski
	<kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman
	<horms@...nel.org>, <linux-mm@...ck.org>, <netdev@...r.kernel.org>
Subject: Re: [RFC PATCH net-next] page_pool: Track DMA-mapped pages and unmap
 them when destroying the pool

On 2025/3/10 17:13, Toke Høiland-Jørgensen wrote:

...

> 
>> Also, we might need a similar locking or synchronisation for the dma
>> sync API in order to skip the dma sync API when page_pool_destroy() is
>> called too.
> 
> Good point, but that seems a separate issue? And simpler to solve (just

If I understand the comment from DMA experts correctly, the dma_sync API
should not be allowed to be called after the dma_unmap API.

> set pool->dma_sync to false when destroying?).

Without locking or synchronisation, there is some small window between
pool->dma_sync checking and dma sync API calling, during which the driver
might have already unbound.

> 
>>> To avoid having to walk the entire xarray on unmap to find the page
>>> reference, we stash the ID assigned by xa_alloc() into the page
>>> structure itself, in the field previously called '_pp_mapping_pad' in
>>> the page_pool struct inside struct page. This field overlaps with the
>>> page->mapping pointer, which may turn out to be problematic, so an
>>> alternative is probably needed. Sticking the ID into some of the upper
>>> bits of page->pp_magic may work as an alternative, but that requires
>>> further investigation. Using the 'mapping' field works well enough as
>>> a demonstration for this RFC, though.
>> page->pp_magic seems to only have 16 bits space available at most when
>> trying to reuse some unused bits in page->pp_magic, as BPF_PTR_POISON
>> and STACK_DEPOT_POISON seems to already use 16 bits, so:
>> 1. For 32 bits system, it seems there is only 16 bits left even if the
>>     ILLEGAL_POINTER_VALUE is defined as 0x0.
>> 2. For 64 bits system, it seems there is only 12 bits left for powerpc
>>     as ILLEGAL_POINTER_VALUE is defined as 0x5deadbeef0000000, see
>>     arch/powerpc/Kconfig.
>>
>> So it seems we might need to limit the dma mapping count to 4096 or
>> 65536?
>>
>> And to be honest, I am not sure if those 'unused' 12/16 bits can really 
>> be reused or not, I guess we might need suggestion from mm and arch
>> experts here.
> 
> Why do we need to care about BPF_PTR_POISON and STACK_DEPOT_POISON?
> AFAICT, we only need to make sure we preserve the PP_SIGNATURE value.
> See v2 of the RFC patch, the bit arithmetic there gives me:
> 
> - 24 bits on 32-bit architectures
> - 21 bits on PPC64 (because of the definition of ILLEGAL_POINTER_VALUE)
> - 32 bits on other 64-bit architectures
> 
> Which seems to be plenty?

I am really doubtful it is that simple, but we always can hear from the
experts if it isn't:)

> 
>>> Since all the tracking is performed on DMA map/unmap, no additional code
>>> is needed in the fast path, meaning the performance overhead of this
>>> tracking is negligible. The extra memory needed to track the pages is
>>> neatly encapsulated inside xarray, which uses the 'struct xa_node'
>>> structure to track items. This structure is 576 bytes long, with slots
>>> for 64 items, meaning that a full node occurs only 9 bytes of overhead
>>> per slot it tracks (in practice, it probably won't be this efficient,
>>> but in any case it should be an acceptable overhead).
>>
>> Even if items is stored sequentially in xa_node at first, is it possible
>> that there may be fragmentation in those xa_node when pages are released
>> and allocated many times during packet processing? If yes, is there any
>> fragmentation info about those xa_node?
> 
> Some (that's what I mean with "not as efficient"). AFAICT, xa_array does
> do some rebalancing of the underlying radix tree, freeing nodes when
> they are no longer used. However, I am not too familiar with the xarray
> code, so I don't know exactly how efficient this is in practice.

I guess that is one of the disadvantages that an advanced struct like
Xarray is used:(

>