lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20191029154654.GC33522@cmpxchg.org>
Date:   Tue, 29 Oct 2019 11:46:54 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Hillf Danton <hdanton@...a.com>
Cc:     linux-mm <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Chris Down <chris@...isdown.name>, Tejun Heo <tj@...nel.org>,
        Roman Gushchin <guro@...com>, Michal Hocko <mhocko@...nel.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Minchan Kim <minchan@...nel.org>, Mel Gorman <mgorman@...e.de>
Subject: Re: [RFC v2] memcg: add memcg lru for page reclaiming

On Sat, Oct 26, 2019 at 07:07:45PM +0800, Hillf Danton wrote:
> 
> Currently soft limit reclaim is frozen, see
> Documentation/admin-guide/cgroup-v2.rst for reasons.
> 
> This work adds memcg hook into kswapd's logic to bypass slr,
> paving a brick for its cleanup later.
> 
> After b23afb93d317 ("memcg: punt high overage reclaim to
> return-to-userland path"), high limit breachers are reclaimed one
> after another spiraling up through the memcg hierarchy before
> returning to userspace.
> 
> We can not add new hook yet if it is infeasible to defer that
> reclaiming a bit further until kswapd becomes active.
> 
> It can be defered however because high limit breach looks benign
> in the absence of memory pressure, or we ensure it will be
> reclaimed soon in the presence of kswapd.

I have no idea what this patch is actually trying to do. But this
premise here, as well as the implementation, are seriously flawed.

memory.high needs to be enforced synchronously. Current users expect
workloads to be strictly contained or throttled by memory.high in
order to ensure consistent behavior regardless of the host
environment, as well as prevent interference with other workloads
whose startup time could be slowed down by this lack of containment.

On the implementation side, it appears you patched out reclaim but
left in the throttling that's supposed to make up for failing
reclaim. That means that once a cgroup tree's cache footprint grows
past its memory.high, instead of simply picking up the cold cache
pages, it'll get throttled heavily and see extreme memory pressure. It
could take ages for it to grow to the point where kswapd wakes up.

Nacked-by: Johannes Weiner <hannes@...xchg.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ