netdev - Re: [RFC PATCH net-next] page_pool: Track DMA-mapped pages and unmap them when destroying the pool

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <875xkgykmi.fsf@toke.dk>
Date: Mon, 10 Mar 2025 16:24:05 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Yunsheng Lin <linyunsheng@...wei.com>, Yunsheng Lin
 <yunshenglin0825@...il.com>, Andrew Morton <akpm@...ux-foundation.org>,
 Jesper Dangaard Brouer <hawk@...nel.org>, Ilias Apalodimas
 <ilias.apalodimas@...aro.org>, "David S. Miller" <davem@...emloft.net>
Cc: Yonglong Liu <liuyonglong@...wei.com>, Mina Almasry
 <almasrymina@...gle.com>, Eric Dumazet <edumazet@...gle.com>, Jakub
 Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman
 <horms@...nel.org>, linux-mm@...ck.org, netdev@...r.kernel.org
Subject: Re: [RFC PATCH net-next] page_pool: Track DMA-mapped pages and
 unmap them when destroying the pool

Yunsheng Lin <linyunsheng@...wei.com> writes:

> On 2025/3/10 17:13, Toke Høiland-Jørgensen wrote:
>
> ...
>
>> 
>>> Also, we might need a similar locking or synchronisation for the dma
>>> sync API in order to skip the dma sync API when page_pool_destroy() is
>>> called too.
>> 
>> Good point, but that seems a separate issue? And simpler to solve (just
>
> If I understand the comment from DMA experts correctly, the dma_sync API
> should not be allowed to be called after the dma_unmap API.

Ah, right, I see what you mean; will add a check for this.

>> set pool->dma_sync to false when destroying?).
>
> Without locking or synchronisation, there is some small window between
> pool->dma_sync checking and dma sync API calling, during which the driver
> might have already unbound.
>
>> 
>>>> To avoid having to walk the entire xarray on unmap to find the page
>>>> reference, we stash the ID assigned by xa_alloc() into the page
>>>> structure itself, in the field previously called '_pp_mapping_pad' in
>>>> the page_pool struct inside struct page. This field overlaps with the
>>>> page->mapping pointer, which may turn out to be problematic, so an
>>>> alternative is probably needed. Sticking the ID into some of the upper
>>>> bits of page->pp_magic may work as an alternative, but that requires
>>>> further investigation. Using the 'mapping' field works well enough as
>>>> a demonstration for this RFC, though.
>>> page->pp_magic seems to only have 16 bits space available at most when
>>> trying to reuse some unused bits in page->pp_magic, as BPF_PTR_POISON
>>> and STACK_DEPOT_POISON seems to already use 16 bits, so:
>>> 1. For 32 bits system, it seems there is only 16 bits left even if the
>>>     ILLEGAL_POINTER_VALUE is defined as 0x0.
>>> 2. For 64 bits system, it seems there is only 12 bits left for powerpc
>>>     as ILLEGAL_POINTER_VALUE is defined as 0x5deadbeef0000000, see
>>>     arch/powerpc/Kconfig.
>>>
>>> So it seems we might need to limit the dma mapping count to 4096 or
>>> 65536?
>>>
>>> And to be honest, I am not sure if those 'unused' 12/16 bits can really 
>>> be reused or not, I guess we might need suggestion from mm and arch
>>> experts here.
>> 
>> Why do we need to care about BPF_PTR_POISON and STACK_DEPOT_POISON?
>> AFAICT, we only need to make sure we preserve the PP_SIGNATURE value.
>> See v2 of the RFC patch, the bit arithmetic there gives me:
>> 
>> - 24 bits on 32-bit architectures
>> - 21 bits on PPC64 (because of the definition of ILLEGAL_POINTER_VALUE)
>> - 32 bits on other 64-bit architectures
>> 
>> Which seems to be plenty?
>
> I am really doubtful it is that simple, but we always can hear from the
> experts if it isn't:)

Do you have any specific reason to think so? :)

>>>> Since all the tracking is performed on DMA map/unmap, no additional code
>>>> is needed in the fast path, meaning the performance overhead of this
>>>> tracking is negligible. The extra memory needed to track the pages is
>>>> neatly encapsulated inside xarray, which uses the 'struct xa_node'
>>>> structure to track items. This structure is 576 bytes long, with slots
>>>> for 64 items, meaning that a full node occurs only 9 bytes of overhead
>>>> per slot it tracks (in practice, it probably won't be this efficient,
>>>> but in any case it should be an acceptable overhead).
>>>
>>> Even if items is stored sequentially in xa_node at first, is it possible
>>> that there may be fragmentation in those xa_node when pages are released
>>> and allocated many times during packet processing? If yes, is there any
>>> fragmentation info about those xa_node?
>> 
>> Some (that's what I mean with "not as efficient"). AFAICT, xa_array does
>> do some rebalancing of the underlying radix tree, freeing nodes when
>> they are no longer used. However, I am not too familiar with the xarray
>> code, so I don't know exactly how efficient this is in practice.
>
> I guess that is one of the disadvantages that an advanced struct like
> Xarray is used:(

Sure, there will be some overhead from using xarray, but I think the
simplicity makes up for it; especially since we can limit this to the
cases where it's absolutely needed.

-Toke