netdev - Re: issue with inflight pages from page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZD26lb2qdsdX16qa@lore-desk>
Date:   Mon, 17 Apr 2023 23:31:01 +0200
From:   Lorenzo Bianconi <lorenzo@...nel.org>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Eric Dumazet <edumazet@...gle.com>, netdev@...r.kernel.org,
        hawk@...nel.org, ilias.apalodimas@...aro.org, davem@...emloft.net,
        pabeni@...hat.com, bpf@...r.kernel.org,
        lorenzo.bianconi@...hat.com, nbd@....name
Subject: Re: issue with inflight pages from page_pool

> On Mon, 17 Apr 2023 20:42:39 +0200 Lorenzo Bianconi wrote:
> > > Is drgn available for your target? You could try to scan the pages on
> > > the system and see if you can find what's still pointing to the page
> > > pool (assuming they are indeed leaked and not returned to the page
> > > allocator without releasing :()  
> > 
> > I will test it but since setting sysctl_skb_defer_max to 0 fixes the issue,
> > I think the pages are still properly linked to the pool, they are just not
> > returned to it. I proved it using the other patch I posted [0] where I can see
> > the counter of returned pages incrementing from time to time (in a very long
> > time slot..).
> 
> If it's that then I'm with Eric. There are many ways to keep the pages
> in use, no point working around one of them and not the rest :(

I was not clear here, my fault. What I mean is I can see the returned
pages counter increasing from time to time, but during most of tests,
even after 2h the tcp traffic has stopped, page_pool_release_retry()
still complains not all the pages are returned to the pool and so the
pool has not been deallocated yet.
The chunk of code in my first email is just to demonstrate the issue
and I am completely fine to get a better solution :) I guess we just
need a way to free the pool in a reasonable amount of time. Agree?

> 
> > Unrelated to this issue, but debugging it I think a found a page_pool leak in
> > skb_condense() [1] where we can reallocate the skb data using kmalloc for a
> > page_pool recycled skb.
> 
> I don't see a problem having pp_recycle = 1 and head in slab is legal.
> pp_recycle just means that *if* a page is from the page pool we own 
> the recycling reference. A page from slab will not be treated as a PP
> page cause it doesn't have pp_magic set to the correct pattern.

ack, right. Thx for pointing this out.

Regards,
Lorenzo

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)