[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87h6s3nhv4.fsf@toke.dk>
Date: Tue, 23 May 2023 18:16:15 +0200
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Jesper Dangaard Brouer <brouer@...hat.com>, Ilias Apalodimas
<ilias.apalodimas@...aro.org>, netdev@...r.kernel.org, Eric Dumazet
<eric.dumazet@...il.com>, linux-mm@...ck.org, Mel Gorman
<mgorman@...hsingularity.net>
Cc: Jesper Dangaard Brouer <brouer@...hat.com>, lorenzo@...nel.org,
linyunsheng@...wei.com, bpf@...r.kernel.org, "David S. Miller"
<davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
<pabeni@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>,
willy@...radead.org
Subject: Re: [PATCH RFC net-next/mm V4 2/2] page_pool: Remove workqueue in
new shutdown scheme
> void page_pool_destroy(struct page_pool *pool)
> {
> + unsigned int flags;
> + u32 release_cnt;
> + u32 hold_cnt;
> +
> if (!pool)
> return;
>
> @@ -868,11 +894,45 @@ void page_pool_destroy(struct page_pool *pool)
> if (!page_pool_release(pool))
> return;
>
> - pool->defer_start = jiffies;
> - pool->defer_warn = jiffies + DEFER_WARN_INTERVAL;
> + /* PP have pages inflight, thus cannot immediately release memory.
> + * Enter into shutdown phase, depending on remaining in-flight PP
> + * pages to trigger shutdown process (on concurrent CPUs) and last
> + * page will free pool instance.
> + *
> + * There exist two race conditions here, we need to take into
> + * account in the following code.
> + *
> + * 1. Before setting PP_FLAG_SHUTDOWN another CPU released the last
> + * pages into the ptr_ring. Thus, it missed triggering shutdown
> + * process, which can then be stalled forever.
> + *
> + * 2. After setting PP_FLAG_SHUTDOWN another CPU released the last
> + * page, which triggered shutdown process and freed pool
> + * instance. Thus, its not safe to dereference *pool afterwards.
> + *
> + * Handling races by holding a fake in-flight count, via artificially
> + * bumping pages_state_hold_cnt, which assures pool isn't freed under
> + * us. Use RCU Grace-Periods to guarantee concurrent CPUs will
> + * transition safely into the shutdown phase.
> + *
> + * After safely transition into this state the races are resolved. For
> + * race(1) its safe to recheck and empty ptr_ring (it will not free
> + * pool). Race(2) cannot happen, and we can release fake in-flight count
> + * as last step.
> + */
> + hold_cnt = READ_ONCE(pool->pages_state_hold_cnt) + 1;
> + WRITE_ONCE(pool->pages_state_hold_cnt, hold_cnt);
> + synchronize_rcu();
> +
> + flags = READ_ONCE(pool->p.flags) | PP_FLAG_SHUTDOWN;
> + WRITE_ONCE(pool->p.flags, flags);
> + synchronize_rcu();
Hmm, synchronize_rcu() can be quite expensive; why do we need two of
them? Should be fine to just do one after those two writes, as long as
the order of those writes is correct (which WRITE_ONCE should ensure)?
Also, if we're adding this (blocking) operation in the teardown path we
risk adding latency to that path (network interface removal,
BPF_PROG_RUN syscall etc), so not sure if this actually ends up being an
improvement anymore, as opposed to just keeping the workqueue but
dropping the warning?
-Toke
Powered by blists - more mailing lists