linux-kernel - Re: [RFC PATCH] mm: memcontrol: memory+swap accounting for cgroup-v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 21 Dec 2017 05:37:26 -0800
From:   Tejun Heo <tj@...nel.org>
To:     Shakeel Butt <shakeelb@...gle.com>
Cc:     Michal Hocko <mhocko@...nel.org>, Li Zefan <lizefan@...wei.com>,
        Roman Gushchin <guro@...com>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Greg Thelen <gthelen@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Cgroups <cgroups@...r.kernel.org>, linux-doc@...r.kernel.org
Subject: Re: [RFC PATCH] mm: memcontrol: memory+swap accounting for cgroup-v2

Hello, Shakeel.

On Wed, Dec 20, 2017 at 05:15:41PM -0800, Shakeel Butt wrote:
> Let's say we have a job that allocates 100 MiB memory and suppose 80
> MiB is anon and 20 MiB is non-anon (file & kmem).
> 
> [With memsw] Scheduler sets the memsw limit of the job to 100 MiB and
> memory to max. Now suppose the job tries to allocates memory more than
> 100 MiB, it will hit the memsw limit and will try to reclaim non-anon
> memory. The memcg OOM behavior will only depend on the reclaim of
> non-anon memory and will be independent of the underlying swap device.

Sure, the direct reclaim on memsw limit won't reclaim anon pages, but
think about how the state at that point would have formed.  You're
claiming that memsw makes memory allocation and balancing behavior an
invariant against the performance of the swap device that the machine
has.  It's simply not possible.

On top of that, what's the point?

1. As I wrote earlier, given the current OOM killer implementation,
   whether OOM kicks in or not is not even that relevant in
   determining the health of the workload.  There are frequent failure
   modes where OOM killer fails to kick in while the workload isn't
   making any meaningful forward progress.

2. On hitting memsw limit, the OOM decision is dependent on the
   performance of the file backing devices.  Why is that necessarily
   better than being dependent on swap or both, which would increase
   the reclaim efficiency anyway?  You can't avoid being affected by
   the underlying hardware one way or the other.

3. The only thing memsw does is that memsw direct reclaim will only
   consider file backed pages, which I think is more of an accident
   (in an attemp to avoid lower swap setting meaning higher actual
   memory usage) than the intended outcome.  This is obviously
   suboptimal and an implementation detail.  I don't think it's
   something we want to expose to userland as a feature.

Thanks.

-- 
tejun