[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <867eb8dc-0c4c-4d12-7c71-5fbd44fc348d@engleder-embedded.com>
Date: Sun, 19 Mar 2023 20:42:58 +0100
From: Gerhard Engleder <gerhard@...leder-embedded.com>
To: netdev@...r.kernel.org, bpf@...r.kernel.org, ast@...nel.org,
daniel@...earbox.net, john.fastabend@...il.com, hawk@...nel.org
Cc: davem@...emloft.net, kuba@...nel.org, edumazet@...gle.com,
pabeni@...hat.com, toke@...hat.com, song@...nel.org
Subject: Re: [RFC PATCH net] bpf: Fix unregister memory model in BPF_PROG_RUN
in test runner
On 11.03.23 22:37, Gerhard Engleder wrote:
> After executing xdp-trafficgen the following kernel output appeared:
> [ 214.564375] page_pool_release_retry() stalled pool shutdown 64 inflight 60 sec
>
> xdp_test_run_batch() in combination with xdp-trafficgen uses a batch
> size of 64. So it seems that a single batch does find its way back to
> the page pool. I checked my tsnep driver, but the page pool entries were
> not lost in the driver according to my analysis.
>
> Executing xdp-trafficgen with n=1000 resulted in this output:
> [ 251.652376] page_pool_release_retry() stalled pool shutdown 40 inflight 60 sec
>
> Executing xdp-trafficgen with n=10000 resulted in this output:
> [ 267.332374] page_pool_release_retry() stalled pool shutdown 16 inflight 60 sec
>
> So interestingly in both cases the last batch with a size lower than 64
> does not find its way back to the page pool. So what's the problem with
> the last batch?
>
> After xdp_test_run_batch() clean up is done in xdp_test_run_teardown()
> no matter if page pool entries are still in flight. No problem for
> page_pool_destroy() as the page pool is released later when no entries
> are in flight.
>
> With commit 425d239379db0 unregistering memory model has been added to
> xdp_test_run_teardown(). Otherwise the page pool would not be released.
> But xdp_unreg_mem_model() resets the memory model type immediately to 0
> (which is actually MEM_TYPE_PAGE_SHARED). So the memory model type
> MEM_TYPE_PAGE_POOL is lost and any inflight page pool entries have no
> chance to find its way back to the page pool. I assume that's the reason
> why the last batch does not find its way back to the page pool.
>
> A simple sleep before xdp_unreg_mem_model() solved this problem, but
> this is no valid fix of course. It needs to be ensured that the memory
> model is not in use anymore. This is the case when the page pool has no
> entries in flight.
>
> How could it be ensured that a call to xdp_unreg_mem_model() is safe?
> In my opinion drivers suffer the same problem. Is there a pattern how
> this can be solved? Or did I misinterprete something?
>
> Fixes: 425d239379db0 ("bpf: Fix release of page_pool in BPF_PROG_RUN in test runner")
> Signed-off-by: Gerhard Engleder <gerhard@...leder-embedded.com>
> ---
> net/bpf/test_run.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index b766a84c8536..eaccfdab0be3 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -202,6 +202,8 @@ static int xdp_test_run_setup(struct xdp_test_data *xdp, struct xdp_buff *orig_c
>
> static void xdp_test_run_teardown(struct xdp_test_data *xdp)
> {
> + usleep_range(10000, 11000);
> +
> xdp_unreg_mem_model(&xdp->mem);
> page_pool_destroy(xdp->pp);
> kfree(xdp->frames);
I did not get any reply. Did I post it the wrong the way?
I did post it as RFC as I have no clear idea how to fix that. But maybe
I'm wrong and there is no problem. Would be great to hear your opinion
about this issue.
Gerhard
Powered by blists - more mailing lists