lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y4W+joR1rIug0ydA@dhcp22.suse.cz>
Date:   Tue, 29 Nov 2022 09:10:54 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Yang Shi <shy828301@...il.com>
Cc:     Yongqiang Liu <liuyongqiang13@...wei.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        aarcange@...hat.com, hughd@...gle.com, mgorman@...e.de,
        cl@...two.org, n-horiguchi@...jp.nec.com, zokeefe@...gle.com,
        rientjes@...gle.com, Matthew Wilcox <willy@...radead.org>,
        peterx@...hat.com,
        "Wangkefeng (OS Kernel Lab)" <wangkefeng.wang@...wei.com>,
        "zhangxiaoxu (A)" <zhangxiaoxu5@...wei.com>,
        kirill.shutemov@...ux.intel.com, Lu Jialin <lujialin4@...wei.com>
Subject: Re: [QUESTION] memcg page_counter seems broken in MADV_DONTNEED with
 THP enabled

On Mon 28-11-22 12:01:37, Yang Shi wrote:
> On Sat, Nov 26, 2022 at 5:10 AM Yongqiang Liu <liuyongqiang13@...wei.com> wrote:
> >
> > Hi,
> >
> > We use mm_counter to how much a process physical memory used. Meanwhile,
> > page_counter of a memcg is used to count how much a cgroup physical
> > memory used.
> > If a cgroup only contains a process, they looks almost the same. But with
> > THP enabled, sometimes memory.usage_in_bytes in memcg may be twice or
> > more than rss
> > in proc/[pid]/smaps_rollup as follow:
[...]
> > node_page_stat which shows in meminfo was also decreased. the
> > __split_huge_pmd
> > seems free no physical memory unless the total THP was free.I am
> > confused which
> > one is the true physical memory used of a process.
> 
> This should be caused by the deferred split of THP. When MADV_DONTNEED
> is called on the partial of the map, the huge PMD is split, but the
> THP itself will not be split until the memory pressure is hit (global
> or memcg limit). So the unmapped sub pages are actually not freed
> until that point. So the mm counter is decreased due to the zapping
> but the physical pages are not actually freed then uncharged from
> memcg.

Yes, and this is not really bound to THP. Consider a page cache. It can
be accessed via syscalls when it doesn't correspondent to rss at all
while it is still charged to a memcg. Or it can be mapped and then later
unmapped so it disappear from rss while it is still charged until it
gets reclaimed by the memory pressure. Or it can be an in-memory object
that is not bound to any process life time (e.g. tmpfs). Or it can be a
kernel memory charged to a memcg which is not covered by rss because it
is either not mapped or it is unknown to rss counters.
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ