[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3449df3e-1133-3971-06bb-62dd0357de40@redhat.com>
Date: Wed, 19 Apr 2023 13:08:22 +0200
From: Jesper Dangaard Brouer <jbrouer@...hat.com>
To: Lorenzo Bianconi <lorenzo@...nel.org>,
Jakub Kicinski <kuba@...nel.org>
Cc: brouer@...hat.com, Eric Dumazet <edumazet@...gle.com>,
netdev@...r.kernel.org, hawk@...nel.org,
ilias.apalodimas@...aro.org, davem@...emloft.net,
pabeni@...hat.com, bpf@...r.kernel.org,
lorenzo.bianconi@...hat.com, nbd@....name
Subject: Re: issue with inflight pages from page_pool
On 18/04/2023 09.36, Lorenzo Bianconi wrote:
>> On Mon, 17 Apr 2023 23:31:01 +0200 Lorenzo Bianconi wrote:
>>>> If it's that then I'm with Eric. There are many ways to keep the pages
>>>> in use, no point working around one of them and not the rest :(
>>>
>>> I was not clear here, my fault. What I mean is I can see the returned
>>> pages counter increasing from time to time, but during most of tests,
>>> even after 2h the tcp traffic has stopped, page_pool_release_retry()
>>> still complains not all the pages are returned to the pool and so the
>>> pool has not been deallocated yet.
>>> The chunk of code in my first email is just to demonstrate the issue
>>> and I am completely fine to get a better solution :)
>>
>> Your problem is perhaps made worse by threaded NAPI, you have
>> defer-free skbs sprayed across all cores and no NAPI there to
>> flush them :(
>
> yes, exactly :)
>
>>
>>> I guess we just need a way to free the pool in a reasonable amount
>>> of time. Agree?
>>
>> Whether we need to guarantee the release is the real question.
>
> yes, this is the main goal of my email. The defer-free skbs behaviour seems in
> contrast with the page_pool pending pages monitor mechanism or at least they
> do not work well together.
>
> @Jesper, Ilias: any input on it?
>
>> Maybe it's more of a false-positive warning.
>>
>> Flushing the defer list is probably fine as a hack, but it's not
>> a full fix as Eric explained. False positive can still happen.
>
> agree, it was just a way to give an idea of the issue, not a proper solution.
>
> Regards,
> Lorenzo
>
>>
>> I'm ambivalent. My only real request wold be to make the flushing
>> a helper in net/core/dev.c rather than open coded in page_pool.c.
I agree. We need a central defer_list flushing helper
It is too easy to say this is a false-positive warning.
IHMO this expose an issue with the sd->defer_list system.
Lorenzo's test is adding+removing veth devices, which creates and runs
NAPI processing on random CPUs. After veth netdevices (+NAPI) are
removed, nothing will naturally invoking net_rx_softirq on this CPU.
Thus, we have SKBs waiting on CPUs sd->defer_list. Further more we will
not create new SKB with this skb->alloc_cpu, to trigger RX softirq IPI
call (trigger_rx_softirq), even if this CPU process and frees SKBs.
I see two solutions:
(1) When netdevice/NAPI unregister happens call defer_list flushing
helper.
(2) Use napi_watchdog to detect if defer_list is (many jiffies) old,
and then call defer_list flushing helper.
>>
>> Somewhat related - Eric, do we need to handle defer_list in dev_cpu_dead()?
Looks to me like dev_cpu_dead() also need this flushing helper for
sd->defer_list, or at least moving the sd->defer_list to an sd that will
run eventually.
--Jesper
Powered by blists - more mailing lists