[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <A46945E9-A060-4621-BB43-63C940F6E4B9@gmail.com>
Date: Tue, 12 Nov 2019 09:14:50 -0800
From: "Jonathan Lemon" <jonathan.lemon@...il.com>
To: "Jesper Dangaard Brouer" <brouer@...hat.com>
Cc: netdev@...r.kernel.org, davem@...emloft.net, kernel-team@...com,
ilias.apalodimas@...aro.org
Subject: Re: [net-next PATCH] page_pool: do not release pool until inflight ==
0.
On 12 Nov 2019, at 8:48, Jesper Dangaard Brouer wrote:
> On Tue, 12 Nov 2019 08:33:58 -0800
> "Jonathan Lemon" <jonathan.lemon@...il.com> wrote:
>
>> On 12 Nov 2019, at 4:08, Jesper Dangaard Brouer wrote:
>>
>>> On Mon, 11 Nov 2019 21:32:10 -0800
>>> Jonathan Lemon <jonathan.lemon@...il.com> wrote:
>>>
>>>> The page pool keeps track of the number of pages in flight, and
>>>> it isn't safe to remove the pool until all pages are returned.
>>>>
>>>> Disallow removing the pool until all pages are back, so the pool
>>>> is always available for page producers.
>>>>
>>>> Make the page pool responsible for its own delayed destruction
>>>> instead of relying on XDP, so the page pool can be used without
>>>> xdp.
>>>
>>> Can you please change this to:
>>> [... can be used without] xdp memory model.
>>
>> Okay.
>>
>>
>>>> When all pages are returned, free the pool and notify xdp if the
>>>> pool is registered with the xdp memory system. Have the callback
>>>> perform a table walk since some drivers (cpsw) may share the pool
>>>> among multiple xdp_rxq_info.
>>>>
>>>> Fixes: d956a048cd3f ("xdp: force mem allocator removal and periodic
>>>> warning")
>>>> Signed-off-by: Jonathan Lemon <jonathan.lemon@...il.com>
>>>> ---
>>> [...]
>>>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>>>> index 5bc65587f1c4..bfe96326335d 100644
>>>> --- a/net/core/page_pool.c
>>>> +++ b/net/core/page_pool.c
>>> [...]
>>>
>>> Found an issue, see below.
>>>
>>>> @@ -338,31 +333,10 @@ static void __page_pool_empty_ring(struct
>>>> page_pool *pool)
>>>> }
>>>> }
>>>>
>>>> -static void __warn_in_flight(struct page_pool *pool)
>>>> +static void page_pool_free(struct page_pool *pool)
>>>> {
>>>> - u32 release_cnt = atomic_read(&pool->pages_state_release_cnt);
>>>> - u32 hold_cnt = READ_ONCE(pool->pages_state_hold_cnt);
>>>> - s32 distance;
>>>> -
>>>> - distance = _distance(hold_cnt, release_cnt);
>>>> -
>>>> - /* Drivers should fix this, but only problematic when DMA is used
>>>> */
>>>> - WARN(1, "Still in-flight pages:%d hold:%u released:%u",
>>>> - distance, hold_cnt, release_cnt);
>>>> -}
>>>> -
>>>> -void __page_pool_free(struct page_pool *pool)
>>>> -{
>>>> - /* Only last user actually free/release resources */
>>>> - if (!page_pool_put(pool))
>>>> - return;
>>>> -
>>>> - WARN(pool->alloc.count, "API usage violation");
>>>> - WARN(!ptr_ring_empty(&pool->ring), "ptr_ring is not empty");
>>>> -
>>>> - /* Can happen due to forced shutdown */
>>>> - if (!__page_pool_safe_to_destroy(pool))
>>>> - __warn_in_flight(pool);
>>>> + if (pool->disconnect)
>>>> + pool->disconnect(pool);
>>>> ptr_ring_cleanup(&pool->ring, NULL);
>>>>
>>>> @@ -371,12 +345,8 @@ void __page_pool_free(struct page_pool *pool)
>>>>
>>>> kfree(pool);
>>>> }
>>>> -EXPORT_SYMBOL(__page_pool_free);
>>>
>>> I don't think this is correct according to RCU.
>>>
>>> Let me reproduce the resulting version of page_pool_free():
>>>
>>> static void page_pool_free(struct page_pool *pool)
>>> {
>>> if (pool->disconnect)
>>> pool->disconnect(pool);
>>>
>>> ptr_ring_cleanup(&pool->ring, NULL);
>>>
>>> if (pool->p.flags & PP_FLAG_DMA_MAP)
>>> put_device(pool->p.dev);
>>>
>>> kfree(pool);
>>> }
>>>
>>> The issue is that pool->disconnect() will call into
>>> mem_allocator_disconnect() -> mem_xa_remove(), and mem_xa_remove()
>>> does
>>> a RCU delayed free. And this function immediately does a kfree(pool).
>>>
>>> I do know that we can ONLY reach this page_pool_free() function, when
>>> inflight == 0, but that can happen as soon as __page_pool_clean_page()
>>> does the decrement, and after this trace_page_pool_state_release()
>>> still have access the page_pool object (thus, hard to catch
>>> use-after-free).
>>
>> Is this an issue? The RCU delayed free is for the xa object, it is held
>> in an RCU-protected mem_id_ht, so it can't be freed until all the
>> readers
>> are complete.
>>
>> The change of &pool->pages_state_release_cnt can decrement the inflight
>> pages to 0, and another thread could see inflight == 0 and immediately
>> the remove the pool. The atomic manipulation should be the last use of
>> the pool - this should be documented, I'll add that as well:
>>
>> skip_dma_unmap:
>> /* This may be the last page returned, releasing the pool, so
>> * it is not safe to reference pool afterwards.
>> */
>> count = atomic_inc_return(&pool->pages_state_release_cnt);
>> trace_page_pool_state_release(pool, page, count);
>>
>> The trace_page_pool_state_release() does not dereference pool, it just
>> reports the pointer value, so there shouldn't be any use-after-free.
>
> In the tracepoint we can still dereference the pool object pointer.
> This is made easier via using bpftrace for example see[1] (and with BTF
> this will become more common to do so).
This seems problematic today - essentially the current code is only
safe because it's borrowing the RCU protection from the xdp_mem info,
right?
--
Jonathan
Powered by blists - more mailing lists