linux-kernel - Re: [RFC v2] memcg: add memcg lru for page reclaiming

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <20191029154654.GC33522@cmpxchg.org>
Date:   Tue, 29 Oct 2019 11:46:54 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Hillf Danton <hdanton@...a.com>
Cc:     linux-mm <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Chris Down <chris@...isdown.name>, Tejun Heo <tj@...nel.org>,
        Roman Gushchin <guro@...com>, Michal Hocko <mhocko@...nel.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Minchan Kim <minchan@...nel.org>, Mel Gorman <mgorman@...e.de>
Subject: Re: [RFC v2] memcg: add memcg lru for page reclaiming

On Sat, Oct 26, 2019 at 07:07:45PM +0800, Hillf Danton wrote:
> 
> Currently soft limit reclaim is frozen, see
> Documentation/admin-guide/cgroup-v2.rst for reasons.
> 
> This work adds memcg hook into kswapd's logic to bypass slr,
> paving a brick for its cleanup later.
> 
> After b23afb93d317 ("memcg: punt high overage reclaim to
> return-to-userland path"), high limit breachers are reclaimed one
> after another spiraling up through the memcg hierarchy before
> returning to userspace.
> 
> We can not add new hook yet if it is infeasible to defer that
> reclaiming a bit further until kswapd becomes active.
> 
> It can be defered however because high limit breach looks benign
> in the absence of memory pressure, or we ensure it will be
> reclaimed soon in the presence of kswapd.

I have no idea what this patch is actually trying to do. But this
premise here, as well as the implementation, are seriously flawed.

memory.high needs to be enforced synchronously. Current users expect
workloads to be strictly contained or throttled by memory.high in
order to ensure consistent behavior regardless of the host
environment, as well as prevent interference with other workloads
whose startup time could be slowed down by this lack of containment.

On the implementation side, it appears you patched out reclaim but
left in the throttling that's supposed to make up for failing
reclaim. That means that once a cgroup tree's cache footprint grows
past its memory.high, instead of simply picking up the cold cache
pages, it'll get throttled heavily and see extreme memory pressure. It
could take ages for it to grow to the point where kswapd wakes up.

Nacked-by: Johannes Weiner <hannes@...xchg.org>