linux-kernel - RE: mm: pages are not freed from lru_add

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <D6EDEBF1F91015459DB866AC4EE162CC023C3C4B@IRSMSX103.ger.corp.intel.com>
Date:	Thu, 5 May 2016 17:25:07 +0000
From:	"Odzioba, Lukasz" <lukasz.odzioba@...el.com>
To:	Michal Hocko <mhocko@...nel.org>
CC:	"Hansen, Dave" <dave.hansen@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"Shutemov, Kirill" <kirill.shutemov@...el.com>,
	"Anaczkowski, Lukasz" <lukasz.anaczkowski@...el.com>
Subject: RE: mm: pages are not freed from lru_add_pvecs after process
 termination

On Thu 05-05-16 09:21:00, Michal Hocko wrote: 
> OK, it wasn't that tricky afterall. Maybe I have missed something but
> the following should work. Or maybe the async nature of flushing turns
> out to be just impractical and unreliable and we will end up skipping
> THP (or all compound pages) for pcp LRU add cache. Let's see...

Initially this issue was found on RH's 3.10.x kernel, but now I am using 
4.6-rc6.

In overall it does help and under heavy load it is slightly better than the
second patch. Unfortunately I am still able to hit 10-20% oom kills with it -
(went down from 30-50%) partially due to earlier vmstat_update call
 - it went up to 25-25% with this patch below:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b4359f8..7a5ab0d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3264,17 +3264,17 @@ retry:
        if (!is_thp_gfp_mask(gfp_mask) || (current->flags & PF_KTHREAD))
                migration_mode = MIGRATE_SYNC_LIGHT;

-       if(!vmstat_updated) {
-               vmstat_updated = true;
-               kick_vmstat_update();
-       }
-
        /* Try direct reclaim and then allocating */
        page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,
                                                        &did_some_progress);
        if (page)
                goto got_pg;

+       if(!vmstat_updated) {
+               vmstat_updated = true;
+               kick_vmstat_update();
+       }

I don't quite see an uninvasive way to make sure that we drain all pvecs
before failing allocation and doing it asynchronously will race allocations
anyway - I guess.

Thanks,
Lukas