linux-kernel - Re: Memory management issue in 4.18.15

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181022173550.GA9592@tower.DHCP.thefacebook.com>
Date:   Mon, 22 Oct 2018 17:35:57 +0000
From:   Roman Gushchin <guro@...com>
To:     Michal Hocko <mhocko@...nel.org>
CC:     Spock <dairinin@...il.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Rik van Riel <riel@...riel.com>,
        "Johannes Weiner" <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Sasha Levin <alexander.levin@...rosoft.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: Memory management issue in 4.18.15

On Mon, Oct 22, 2018 at 10:33:22AM +0200, Michal Hocko wrote:
> Cc som more people.
> 
> I am wondering why 172b06c32b94 ("mm: slowly shrink slabs with a
> relatively small number of objects") has been backported to the stable
> tree when not marked that way. Put that aside it seems likely that the
> upstream kernel will have the same issue I suspect. Roman, could you
> have a look please?

So, the problem is probably caused by the unused inode eviction code:
inode_lru_isolate() invalidates all pages belonging to an unreferenced
clean inode at once, even if the goal was to scan (and potentially free)
just one inode (or any other slab object).

Spock's workload, as described, has few large files in the pagecache,
so it becomes noticeable. A small pressure applied on inode cache
surprisingly results in cleaning up significant percentage of the memory.

It happened before my change too, but was probably less noticeable, because
usually required higher memory pressure to happen. So, too aggressive reclaim
was less unexpected.

How to fix this?

It seems to me, that we shouldn't try invalidating pagecache pages from
the inode reclaim path at all (maybe except inodes with only few pages).
If an inode has a lot of attached pagecache, let it be evicted "naturally",
through file LRU lists.
But I need to perform some real-life testing on how this will work.

Thanks!