netdev - Re: [PATCH RFC net-next/mm V4 2/2] page_pool: Remove workqueue in new shutdown scheme

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87h6s3nhv4.fsf@toke.dk>
Date: Tue, 23 May 2023 18:16:15 +0200
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Jesper Dangaard Brouer <brouer@...hat.com>, Ilias Apalodimas
 <ilias.apalodimas@...aro.org>, netdev@...r.kernel.org, Eric Dumazet
 <eric.dumazet@...il.com>, linux-mm@...ck.org, Mel Gorman
 <mgorman@...hsingularity.net>
Cc: Jesper Dangaard Brouer <brouer@...hat.com>, lorenzo@...nel.org,
 linyunsheng@...wei.com, bpf@...r.kernel.org, "David S. Miller"
 <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
 <pabeni@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>,
 willy@...radead.org
Subject: Re: [PATCH RFC net-next/mm V4 2/2] page_pool: Remove workqueue in
 new shutdown scheme

>  void page_pool_destroy(struct page_pool *pool)
>  {
> +	unsigned int flags;
> +	u32 release_cnt;
> +	u32 hold_cnt;
> +
>  	if (!pool)
>  		return;
>  
> @@ -868,11 +894,45 @@ void page_pool_destroy(struct page_pool *pool)
>  	if (!page_pool_release(pool))
>  		return;
>  
> -	pool->defer_start = jiffies;
> -	pool->defer_warn  = jiffies + DEFER_WARN_INTERVAL;
> +	/* PP have pages inflight, thus cannot immediately release memory.
> +	 * Enter into shutdown phase, depending on remaining in-flight PP
> +	 * pages to trigger shutdown process (on concurrent CPUs) and last
> +	 * page will free pool instance.
> +	 *
> +	 * There exist two race conditions here, we need to take into
> +	 * account in the following code.
> +	 *
> +	 * 1. Before setting PP_FLAG_SHUTDOWN another CPU released the last
> +	 *    pages into the ptr_ring.  Thus, it missed triggering shutdown
> +	 *    process, which can then be stalled forever.
> +	 *
> +	 * 2. After setting PP_FLAG_SHUTDOWN another CPU released the last
> +	 *    page, which triggered shutdown process and freed pool
> +	 *    instance. Thus, its not safe to dereference *pool afterwards.
> +	 *
> +	 * Handling races by holding a fake in-flight count, via artificially
> +	 * bumping pages_state_hold_cnt, which assures pool isn't freed under
> +	 * us.  Use RCU Grace-Periods to guarantee concurrent CPUs will
> +	 * transition safely into the shutdown phase.
> +	 *
> +	 * After safely transition into this state the races are resolved.  For
> +	 * race(1) its safe to recheck and empty ptr_ring (it will not free
> +	 * pool). Race(2) cannot happen, and we can release fake in-flight count
> +	 * as last step.
> +	 */
> +	hold_cnt = READ_ONCE(pool->pages_state_hold_cnt) + 1;
> +	WRITE_ONCE(pool->pages_state_hold_cnt, hold_cnt);
> +	synchronize_rcu();
> +
> +	flags = READ_ONCE(pool->p.flags) | PP_FLAG_SHUTDOWN;
> +	WRITE_ONCE(pool->p.flags, flags);
> +	synchronize_rcu();

Hmm, synchronize_rcu() can be quite expensive; why do we need two of
them? Should be fine to just do one after those two writes, as long as
the order of those writes is correct (which WRITE_ONCE should ensure)?

Also, if we're adding this (blocking) operation in the teardown path we
risk adding latency to that path (network interface removal,
BPF_PROG_RUN syscall etc), so not sure if this actually ends up being an
improvement anymore, as opposed to just keeping the workqueue but
dropping the warning?

-Toke