[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <8a2f2644-71d0-05d7-49d8-878aafa99652@huawei.com>
Date: Sat, 26 Nov 2022 21:09:51 +0800
From: Yongqiang Liu <liuyongqiang13@...wei.com>
To: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>
CC: "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
<aarcange@...hat.com>, <hughd@...gle.com>, <mgorman@...e.de>,
<mhocko@...e.cz>, <cl@...two.org>, <n-horiguchi@...jp.nec.com>,
<zokeefe@...gle.com>, <rientjes@...gle.com>,
Matthew Wilcox <willy@...radead.org>, <peterx@...hat.com>,
"Wangkefeng (OS Kernel Lab)" <wangkefeng.wang@...wei.com>,
"zhangxiaoxu (A)" <zhangxiaoxu5@...wei.com>,
<kirill.shutemov@...ux.intel.com>,
Yongqiang Liu <liuyongqiang13@...wei.com>,
Lu Jialin <lujialin4@...wei.com>
Subject: [QUESTION] memcg page_counter seems broken in MADV_DONTNEED with THP
enabled
Hi,
We use mm_counter to how much a process physical memory used. Meanwhile,
page_counter of a memcg is used to count how much a cgroup physical
memory used.
If a cgroup only contains a process, they looks almost the same. But with
THP enabled, sometimes memory.usage_in_bytes in memcg may be twice or
more than rss
in proc/[pid]/smaps_rollup as follow:
[root@...alhost sda]# cat /sys/fs/cgroup/memory/test/memory.usage_in_bytes
1080930304
[root@...alhost sda]# cat /sys/fs/cgroup/memory/test/cgroup.procs
1290
[root@...alhost sda]# cat /proc/1290/smaps_rollup
55ba80600000-ffffffffff601000 ---p 00000000 00:00 0
[rollup]
Rss: 500648 kB
Pss: 498337 kB
Shared_Clean: 2732 kB
Shared_Dirty: 0 kB
Private_Clean: 364 kB
Private_Dirty: 497552 kB
Referenced: 500648 kB
Anonymous: 492016 kB
LazyFree: 0 kB
AnonHugePages: 129024 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
THPeligible: 0
I have found the differences was because that __split_huge_pmd decrease
the mm_counter but page_counter in memcg was not decreased with refcount
of a head page is not zero. Here are the follows:
do_madvise
madvise_dontneed_free
zap_page_range
unmap_single_vma
zap_pud_range
zap_pmd_range
__split_huge_pmd
__split_huge_pmd_locked
__mod_lruvec_page_state
zap_pte_range
add_mm_rss_vec
add_mm_counter -> decrease the
mm_counter
tlb_finish_mmu
arch_tlb_finish_mmu
tlb_flush_mmu_free
free_pages_and_swap_cache
release_pages
folio_put_testzero(page) -> not zero, skip
continue;
__folio_put_large
free_transhuge_page
free_compound_page
mem_cgroup_uncharge
page_counter_uncharge -> decrease the
page_counter
node_page_stat which shows in meminfo was also decreased. the
__split_huge_pmd
seems free no physical memory unless the total THP was free.I am
confused which
one is the true physical memory used of a process.
Kind regards,
Yongqiang Liu
Powered by blists - more mailing lists