lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 12 Nov 2019 09:14:50 -0800
From:   "Jonathan Lemon" <jonathan.lemon@...il.com>
To:     "Jesper Dangaard Brouer" <brouer@...hat.com>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, kernel-team@...com,
        ilias.apalodimas@...aro.org
Subject: Re: [net-next PATCH] page_pool: do not release pool until inflight ==
 0.

On 12 Nov 2019, at 8:48, Jesper Dangaard Brouer wrote:

> On Tue, 12 Nov 2019 08:33:58 -0800
> "Jonathan Lemon" <jonathan.lemon@...il.com> wrote:
>
>> On 12 Nov 2019, at 4:08, Jesper Dangaard Brouer wrote:
>>
>>> On Mon, 11 Nov 2019 21:32:10 -0800
>>> Jonathan Lemon <jonathan.lemon@...il.com> wrote:
>>>
>>>> The page pool keeps track of the number of pages in flight, and
>>>> it isn't safe to remove the pool until all pages are returned.
>>>>
>>>> Disallow removing the pool until all pages are back, so the pool
>>>> is always available for page producers.
>>>>
>>>> Make the page pool responsible for its own delayed destruction
>>>> instead of relying on XDP, so the page pool can be used without
>>>> xdp.
>>>
>>> Can you please change this to:
>>>  [... can be used without] xdp memory model.
>>
>> Okay.
>>
>>
>>>> When all pages are returned, free the pool and notify xdp if the
>>>> pool is registered with the xdp memory system.  Have the callback
>>>> perform a table walk since some drivers (cpsw) may share the pool
>>>> among multiple xdp_rxq_info.
>>>>
>>>> Fixes: d956a048cd3f ("xdp: force mem allocator removal and periodic
>>>> warning")
>>>> Signed-off-by: Jonathan Lemon <jonathan.lemon@...il.com>
>>>> ---
>>> [...]
>>>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>>>> index 5bc65587f1c4..bfe96326335d 100644
>>>> --- a/net/core/page_pool.c
>>>> +++ b/net/core/page_pool.c
>>> [...]
>>>
>>> Found an issue, see below.
>>>
>>>> @@ -338,31 +333,10 @@ static void __page_pool_empty_ring(struct
>>>> page_pool *pool)
>>>>  	}
>>>>  }
>>>>
>>>> -static void __warn_in_flight(struct page_pool *pool)
>>>> +static void page_pool_free(struct page_pool *pool)
>>>>  {
>>>> -	u32 release_cnt = atomic_read(&pool->pages_state_release_cnt);
>>>> -	u32 hold_cnt = READ_ONCE(pool->pages_state_hold_cnt);
>>>> -	s32 distance;
>>>> -
>>>> -	distance = _distance(hold_cnt, release_cnt);
>>>> -
>>>> -	/* Drivers should fix this, but only problematic when DMA is used
>>>> */
>>>> -	WARN(1, "Still in-flight pages:%d hold:%u released:%u",
>>>> -	     distance, hold_cnt, release_cnt);
>>>> -}
>>>> -
>>>> -void __page_pool_free(struct page_pool *pool)
>>>> -{
>>>> -	/* Only last user actually free/release resources */
>>>> -	if (!page_pool_put(pool))
>>>> -		return;
>>>> -
>>>> -	WARN(pool->alloc.count, "API usage violation");
>>>> -	WARN(!ptr_ring_empty(&pool->ring), "ptr_ring is not empty");
>>>> -
>>>> -	/* Can happen due to forced shutdown */
>>>> -	if (!__page_pool_safe_to_destroy(pool))
>>>> -		__warn_in_flight(pool);
>>>> +	if (pool->disconnect)
>>>> +		pool->disconnect(pool);
>>>>  	ptr_ring_cleanup(&pool->ring, NULL);
>>>>
>>>> @@ -371,12 +345,8 @@ void __page_pool_free(struct page_pool *pool)
>>>>
>>>>  	kfree(pool);
>>>>  }
>>>> -EXPORT_SYMBOL(__page_pool_free);
>>>
>>> I don't think this is correct according to RCU.
>>>
>>> Let me reproduce the resulting version of page_pool_free():
>>>
>>>  static void page_pool_free(struct page_pool *pool)
>>>  {
>>> 	if (pool->disconnect)
>>> 		pool->disconnect(pool);
>>>
>>> 	ptr_ring_cleanup(&pool->ring, NULL);
>>>
>>> 	if (pool->p.flags & PP_FLAG_DMA_MAP)
>>> 		put_device(pool->p.dev);
>>>
>>> 	kfree(pool);
>>>  }
>>>
>>> The issue is that pool->disconnect() will call into
>>> mem_allocator_disconnect() -> mem_xa_remove(), and mem_xa_remove()
>>> does
>>> a RCU delayed free.  And this function immediately does a kfree(pool).
>>>
>>> I do know that we can ONLY reach this page_pool_free() function, when
>>> inflight == 0, but that can happen as soon as __page_pool_clean_page()
>>> does the decrement, and after this trace_page_pool_state_release()
>>> still have access the page_pool object (thus, hard to catch
>>> use-after-free).
>>
>> Is this an issue?  The RCU delayed free is for the xa object, it is held
>> in an RCU-protected mem_id_ht, so it can't be freed until all the
>> readers
>> are complete.
>>
>> The change of &pool->pages_state_release_cnt can decrement the inflight
>> pages to 0, and another thread could see inflight == 0 and immediately
>> the remove the pool.  The atomic manipulation should be the last use of
>> the pool - this should be documented, I'll add that as well:
>>
>> skip_dma_unmap:
>>          /* This may be the last page returned, releasing the pool, so
>>           * it is not safe to reference pool afterwards.
>>           */
>>          count = atomic_inc_return(&pool->pages_state_release_cnt);
>>          trace_page_pool_state_release(pool, page, count);
>>
>> The trace_page_pool_state_release() does not dereference pool, it just
>> reports the pointer value, so there shouldn't be any use-after-free.
>
> In the tracepoint we can still dereference the pool object pointer.
> This is made easier via using bpftrace for example see[1] (and with BTF
> this will become more common to do so).

This seems problematic today - essentially the current code is only
safe because it's borrowing the RCU protection from the xdp_mem info,
right?
-- 
Jonathan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ