linux-kernel - Re: mm: pages are not freed from lru_add

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 27 Apr 2016 10:11:04 -0700
From:	Dave Hansen <dave.hansen@...el.com>
To:	"Odzioba, Lukasz" <lukasz.odzioba@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Cc:	"Shutemov, Kirill" <kirill.shutemov@...el.com>,
	"Anaczkowski, Lukasz" <lukasz.anaczkowski@...el.com>
Subject: Re: mm: pages are not freed from lru_add_pvecs after process
 termination

On 04/27/2016 10:01 AM, Odzioba, Lukasz wrote:
> Pieces of the puzzle:
> A) after process termination memory is not getting freed nor accounted as free

I don't think this part is necessarily a bug.  As long as we have stats
*somewhere*, and we really do "reclaim" them, I don't think we need to
call these pages "free".

> I am not sure whether it is expected behavior or a side effect of something else not
> going as it should. Temporarily I added lru_add_drain_all() to try_to_free_pages()
> which sort of hammers B case, but A is still present.

It's not expected behavior.  It's an unanticipated side effect of large
numbers of cpu threads, large pages on the LRU, and (relatively) small
zones.

> I am not familiar with this code, but I feel like draining lru_add work should be split
> into smaller pieces and done by kswapd to fix A and drain only as much pages as
> needed in try_to_free_pages to fix B.
> 
> Any comments/ideas/patches for a proper fix are welcome.

Here are my suggestions.  I've passed these along multiple times, but I
guess I'll repeat them again for good measure.

> 1. We need some statistics on the number and total *SIZES* of all pages
>    in the lru pagevecs.  It's too opaque now.
> 2. We need to make darn sure we drain the lru pagevecs before failing
>    any kind of allocation.
> 3. We need some way to drain the lru pagevecs directly.  Maybe the buddy
>    pcp lists too.
> 4. We need to make sure that a zone_reclaim_mode=0 system still drains
>    too.
> 5. The VM stats and their updates are now related to how often
>    drain_zone_pages() gets run.  That might be interacting here too.

6. Perhaps don't use the LRU pagevecs for large pages.  It limits the
   severity of the problem.