lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230320030648.50663-1-caixinchen1@huawei.com>
Date:   Mon, 20 Mar 2023 03:06:47 +0000
From:   Cai Xinchen <caixinchen1@...wei.com>
To:     <songmuchun@...edance.com>, <akpm@...ux-foundation.org>,
        <hannes@...xchg.org>, <longman@...hat.com>, <mhocko@...nel.org>,
        <roman.gushchin@...ux.dev>, <shakeelb@...gle.com>
CC:     <cgroups@...r.kernel.org>, <duanxiongchun@...edance.com>,
        <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
        <yosryahmed@...gle.com>, <mpenttil@...hat.com>
Subject: [PATCH 0/1] Fix vmstat_percpu incorrect subtraction after reparent

Hello, I see the patch-series (Use obj_cgroup APIs to charge the LRU
pages).
Link: https://lore.kernel.org/all/20220621125658.64935-1-songmuchun@bytedance.com/

There are two problems left:

     root
     /  \
    A    B
   / \    \
  C   E    D

1. In some case of reparent, some page cache may be used by other memcg
D but it charges to the parent memcg A of dying memcg E. D is getting
away with using the page for free while A is taxed.

For this problem, the page may be shared by many memcgs. Which memcg
should be recharged to? It is hard to select. And for recharge method,
for example, the user rmdir E. If we recharge the page to D, some pages
of process attached to D may be reclaimed. The user may feel confused
about the phenomenon that I rmdir E but the processes attached to D are
reclaiming their pages and running slower.

And for cgroup v2, the page is charged to the memcg when it alloc and the
stats is counted to its parent. The method of reparent seems to follow
the rule.

2. The stats problem of vmstats_percpu. When memcg C is offllined, its 
pages are reparented to memcg P, so far P->vmstats (hierarchical) have
those pages, and P->vmstats_percpu (non-hierarchical) don't. When those
 pages get uncharged, P->vmstats (hierachical) decreases, which is correct,
but P->vmstats_percpu (non-hierarchical) also decreases, which is wrong, 
as those stats were never added to P->vmstats_percpu to begin with. If the
reparented memory exceeds the original non-hierarchical memory in P, some
arg such as cache which is show in memory.stat will be zero (if x < 0, it
shows 0)

I think propagate vmstats_percpu stats of dying memcg to its parent can 
solve this problem. If we do not propagate, the reparented memory exceeds
the original non-hierarchical memory in P, (hierarchical_usage -
non-hierarchical_usage(shows 0, but exactly negative number) - 
children_hierarchical_usage) may be meaningless.

And I want to ask for your opinions about problem 1, how to define the 
actions of charging pages to memcg when the memcg is died.

Cai Xinchen (1):
  mm: memcontrol: fix vmstats_percpu state incorrect subtraction after
    reparent

 kernel/cgroup/cgroup.c |  5 +++++
 mm/memcontrol.c        | 43 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 47 insertions(+), 1 deletion(-)

-- 
2.17.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ