lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 17 Apr 2023 23:31:01 +0200
From:   Lorenzo Bianconi <lorenzo@...nel.org>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Eric Dumazet <edumazet@...gle.com>, netdev@...r.kernel.org,
        hawk@...nel.org, ilias.apalodimas@...aro.org, davem@...emloft.net,
        pabeni@...hat.com, bpf@...r.kernel.org,
        lorenzo.bianconi@...hat.com, nbd@....name
Subject: Re: issue with inflight pages from page_pool

> On Mon, 17 Apr 2023 20:42:39 +0200 Lorenzo Bianconi wrote:
> > > Is drgn available for your target? You could try to scan the pages on
> > > the system and see if you can find what's still pointing to the page
> > > pool (assuming they are indeed leaked and not returned to the page
> > > allocator without releasing :()  
> > 
> > I will test it but since setting sysctl_skb_defer_max to 0 fixes the issue,
> > I think the pages are still properly linked to the pool, they are just not
> > returned to it. I proved it using the other patch I posted [0] where I can see
> > the counter of returned pages incrementing from time to time (in a very long
> > time slot..).
> 
> If it's that then I'm with Eric. There are many ways to keep the pages
> in use, no point working around one of them and not the rest :(

I was not clear here, my fault. What I mean is I can see the returned
pages counter increasing from time to time, but during most of tests,
even after 2h the tcp traffic has stopped, page_pool_release_retry()
still complains not all the pages are returned to the pool and so the
pool has not been deallocated yet.
The chunk of code in my first email is just to demonstrate the issue
and I am completely fine to get a better solution :) I guess we just
need a way to free the pool in a reasonable amount of time. Agree?

> 
> > Unrelated to this issue, but debugging it I think a found a page_pool leak in
> > skb_condense() [1] where we can reallocate the skb data using kmalloc for a
> > page_pool recycled skb.
> 
> I don't see a problem having pp_recycle = 1 and head in slab is legal.
> pp_recycle just means that *if* a page is from the page pool we own 
> the recycling reference. A page from slab will not be treated as a PP
> page cause it doesn't have pp_magic set to the correct pattern.

ack, right. Thx for pointing this out.

Regards,
Lorenzo

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ