lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 28 May 2019 20:44:21 +0800
From:   Yang Shi <yang.shi@...ux.alibaba.com>
To:     ktkhai@...tuozzo.com, hannes@...xchg.org, mhocko@...e.com,
        kirill.shutemov@...ux.intel.com, hughd@...gle.com,
        shakeelb@...gle.com, akpm@...ux-foundation.org
Cc:     yang.shi@...ux.alibaba.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: [RFC PATCH 0/3] Make deferred split shrinker memcg aware


I got some reports from our internal application team about memcg OOM.
Even though the application has been killed by oom killer, there are
still a lot THPs reside, page reclaim doesn't reclaim them at all.

Some investigation shows they are on deferred split queue, memcg direct
reclaim can't shrink them since THP deferred split shrinker is not memcg
aware, this may cause premature OOM in memcg.  The issue can be
reproduced easily by the below test:

$ cgcreate -g memory:thp
$ echo 4G > /sys/fs/cgroup/memory/thp/memory/limit_in_bytes
$ cgexec -g memory:thp ./transhuge-stress 4000

transhuge-stress comes from kernel selftest.

It is easy to hit OOM, but there are still a lot THP on the deferred split
queue, memcg direct reclaim can't touch them since the deferred split
shrinker is not memcg aware.

Convert deferred split shrinker memcg aware by introducing per memcg deferred
split queue.  The THP should be on either per node or per memcg deferred
split queue if it belongs to a memcg.  When the page is immigrated to the
other memcg, it will be immigrated to the target memcg's deferred split queue
too.

And, move deleting THP from deferred split queue in page free before memcg
uncharge so that the page's memcg information is available.

Reuse the second tail page's deferred_list for per memcg list since the same
THP can't be on multiple deferred split queues at the same time.

Remove THP specific destructor since it is not used anymore with memcg aware
THP shrinker (Please see the commit log of patch 2/3 for the details).

Make deferred split shrinker not depend on memcg kmem since it is not slab.
It doesn't make sense to not shrink THP even though memcg kmem is disabled.

With the above change the test demonstrated above doesn't trigger OOM anymore
even though with cgroup.memory=nokmem.


Yang Shi (3):
      mm: thp: make deferred split shrinker memcg aware
      mm: thp: remove THP destructor
      mm: shrinker: make shrinker not depend on memcg kmem

 include/linux/huge_mm.h    |  24 +++++++++
 include/linux/memcontrol.h |   6 +++
 include/linux/mm.h         |   3 --
 include/linux/mm_types.h   |   7 ++-
 include/linux/shrinker.h   |   3 +-
 mm/huge_memory.c           | 181 ++++++++++++++++++++++++++++++++++++++++++++++++-------------------
 mm/memcontrol.c            |  20 ++++++++
 mm/page_alloc.c            |   3 --
 mm/swap.c                  |   4 ++
 mm/vmscan.c                |  27 +++-------
 10 files changed, 198 insertions(+), 80 deletions(-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ