netdev - Re: [PATCH RFC net-next/mm V2 1/2] page_pool: Remove workqueue in new shutdown scheme

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f671f5da-d9bc-a559-2120-10c3491e6f6d@redhat.com>
Date:   Fri, 28 Apr 2023 15:48:06 +0200
From:   Jesper Dangaard Brouer <jbrouer@...hat.com>
To:     Toke Høiland-Jørgensen <toke@...hat.com>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        netdev@...r.kernel.org, Eric Dumazet <eric.dumazet@...il.com>,
        linux-mm@...ck.org, Mel Gorman <mgorman@...hsingularity.net>
Cc:     brouer@...hat.com, lorenzo@...nel.org, linyunsheng@...wei.com,
        bpf@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>, willy@...radead.org
Subject: Re: [PATCH RFC net-next/mm V2 1/2] page_pool: Remove workqueue in new
 shutdown scheme


On 27/04/2023 22.53, Toke Høiland-Jørgensen wrote:
>> @@ -490,11 +508,15 @@ void page_pool_release_page(struct page_pool *pool, struct page *page)
>>   skip_dma_unmap:
>>   	page_pool_clear_pp_info(page);
>>   
>> -	/* This may be the last page returned, releasing the pool, so
>> -	 * it is not safe to reference pool afterwards.
>> -	 */
>> -	count = atomic_inc_return_relaxed(&pool->pages_state_release_cnt);
>> -	trace_page_pool_state_release(pool, page, count);
>> +	if (flags & PP_FLAG_SHUTDOWN)
>> +		hold_cnt = pp_read_hold_cnt(pool);
>> +
>> +	release_cnt = atomic_inc_return(&pool->pages_state_release_cnt);
>> +	trace_page_pool_state_release(pool, page, release_cnt);
>> +
>> +	/* In shutdown phase, last page will free pool instance */
>> +	if (flags & PP_FLAG_SHUTDOWN)
>> +		page_pool_free_attempt(pool, hold_cnt, release_cnt);
 >
> Since the assumption is that no new pages will be allocated once the
> PP_FLAG_SHUTDOWN is set (i.e., hold_count can not increase in the case),
> I don't think it matters what order you read the hold and release counts
> in? So you could simplify the above to just:
> 
>> +	if (flags & PP_FLAG_SHUTDOWN)
>> +		page_pool_free_attempt(pool, pp_read_hold_cnt(pool), release_cnt);
> and drop the second check of the flag further up?
> 
> You could probably even lose the hold_cnt argument entirely from
> page_pool_free_attempt() and just have it call pp_read_hold_cnt() directly?
>

I unfortunately think we have to keep this approach.

The purpose is to read out data from *pool, such that it is safe to call
page_pool_free_attempt() even when *pool memory have been freed.

I believe there is a race window between atomic_inc_return() and freeing
in page_pool_free_attempt(). (As we have tracepoints in this critical
section we might even be able to increase the chance of the race)

Imagine two CPUs freeing the last two PP pages.
Hold=2 which means when release_cnt reach 2 inflight is zero.

  CPU-1 : release_cnt 1 = atomic_inc_return();
  CPU-1 : gets preempted (or runs slow bpf-prog in tracepoint)
  CPU-2 : release_cnt 2 = atomic_inc_return();
  CPU-2 : page_pool_free_attempt(pool, 2, release_cnt=2);
  CPU-2 : find no-inflight -> calls page_pool_free(pool)
  CPU-1 : page_pool_free_attempt(pool, 2, release_cnt=1);
  CPU-1 : *use-after-free* deref pool->pages_state_hold_cnt


>>   }
>>   EXPORT_SYMBOL(page_pool_release_page);