linux-kernel - Re: [PATCH v1 0/7] mm/memcontrol: recharge mlocked pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190904231350.GA5246@tower.dhcp.thefacebook.com>
Date:   Wed, 4 Sep 2019 23:13:54 +0000
From:   Roman Gushchin <guro@...com>
To:     Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
CC:     "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
        Michal Hocko <mhocko@...e.com>,
        Johannes Weiner <hannes@...xchg.org>
Subject: Re: [PATCH v1 0/7] mm/memcontrol: recharge mlocked pages

On Wed, Sep 04, 2019 at 04:53:08PM +0300, Konstantin Khlebnikov wrote:
> Currently mlock keeps pages in cgroups where they were accounted.
> This way one container could affect another if they share file cache.
> Typical case is writing (downloading) file in one container and then
> locking in another. After that first container cannot get rid of cache.

Yeah, it's a valid problem, and it's not about mlocked pages only,
the same thing is true for generic pagecache. The only difference is that
in theory memory pressure should fix everything. But in reality
pagecache used by the second container can be very hot, so the first
once can't really get rid of it.
In other words, there is no way to pass a pagecache page between cgroups
without evicting it and re-reading from a storage, which is sub-optimal
in many cases.

We thought about new madvise(), which will uncharge pagecache but set
a new page flag, which will mean something like "whoever first starts using
the page, should be charged for it". But it never materialized in a patchset.

> Also removed cgroup stays pinned by these mlocked pages.

Tbh, I don't think it's a big issue here. If only there is a huge number
of 1-page sized mlock areas, but this seems to be unlikely.

> 
> This patchset implements recharging pages to cgroup of mlock user.
> 
> There are three cases:
> * recharging at first mlock
> * recharging at munlock to any remaining mlock
> * recharging at 'culling' in reclaimer to any existing mlock
> 
> To keep things simple recharging ignores memory limit. After that memory
> usage temporary could be higher than limit but cgroup will reclaim memory
> later or trigger oom, which is valid outcome when somebody mlock too much.

OOM is a concern here. If quitting an application will cause an immediate OOM
in an other cgroup, that's not so good. Ideally it should work like
memory.high, forcing all threads in the second cgroup into direct reclaim.

Thanks!