[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <60eb1dbd-b320-ce9b-34f5-bc2e8b6d660b@linux.alibaba.com>
Date:   Mon, 8 Jul 2019 15:52:29 -0700
From:   Yang Shi <yang.shi@...ux.alibaba.com>
To:     rientjes@...gle.com, kirill.shutemov@...ux.intel.com,
        mhocko@...e.com, hannes@...xchg.org, akpm@...ux-foundation.org
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2 -mm] mm: account lazy free pages separately
Hi guys,
Any comment on this series?
Thanks,
Yang
On 6/27/19 10:12 AM, Yang Shi wrote:
> When doing partial unmap to THP, the pages in the affected range would
> be considered to be reclaimable when memory pressure comes in.  And,
> such pages would be put on deferred split queue and get minus from the
> memory statistics (i.e. /proc/meminfo).
>
> For example, when doing THP split test, /proc/meminfo would show:
>
> Before put on lazy free list:
> MemTotal:       45288336 kB
> MemFree:        43281376 kB
> MemAvailable:   43254048 kB
> ...
> Active(anon):    1096296 kB
> Inactive(anon):     8372 kB
> ...
> AnonPages:       1096264 kB
> ...
> AnonHugePages:   1056768 kB
>
> After put on lazy free list:
> MemTotal:       45288336 kB
> MemFree:        43282612 kB
> MemAvailable:   43255284 kB
> ...
> Active(anon):    1094228 kB
> Inactive(anon):     8372 kB
> ...
> AnonPages:         49668 kB
> ...
> AnonHugePages:     10240 kB
>
> The THPs confusingly look disappeared although they are still on LRU if
> you are not familair the tricks done by kernel.
>
> Accounted the lazy free pages to NR_LAZYFREE, and show them in meminfo
> and other places.  With the change the /proc/meminfo would look like:
> Before put on lazy free list:
> AnonHugePages:   1056768 kB
> ShmemHugePages:        0 kB
> ShmemPmdMapped:        0 kB
> LazyFreePages:         0 kB
>
> After put on lazy free list:
> AnonHugePages:     10240 kB
> ShmemHugePages:        0 kB
> ShmemPmdMapped:        0 kB
> LazyFreePages:   1046528 kB
>
> And, this is also the preparation for the following patch to account
> lazy free pages to available memory.
>
> Here the lazyfree doesn't count MADV_FREE pages since they are not
> actually unmapped until they get reclaimed.  And, they are put on
> inactive file LRU, so they have been accounted for available memory.
>
> Signed-off-by: Yang Shi <yang.shi@...ux.alibaba.com>
> ---
> I'm not quite sure whether LazyFreePages is a good name or not since "Lazyfree"
> is typically referred to MADV_FREE pages.  I could use a more spceific name,
> i.e. "DeferredSplitTHP" since it doesn't account MADV_FREE as explained in the
> commit log.  But, a more general name would be good for including other type
> pages in the future.
>
> And, I'm also not sure if it is a good idea to show this in memcg stat or not.
>
>   Documentation/filesystems/proc.txt | 12 ++++++++----
>   drivers/base/node.c                |  3 +++
>   fs/proc/meminfo.c                  |  3 +++
>   include/linux/mmzone.h             |  1 +
>   mm/huge_memory.c                   |  8 ++++++++
>   mm/page_alloc.c                    |  2 ++
>   mm/vmstat.c                        |  1 +
>   7 files changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 66cad5c..851ddfd 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -895,6 +895,7 @@ HardwareCorrupted:   0 kB
>   AnonHugePages:   49152 kB
>   ShmemHugePages:      0 kB
>   ShmemPmdMapped:      0 kB
> +LazyFreePages:       0 kB
>   
>   
>       MemTotal: Total usable ram (i.e. physical ram minus a few reserved
> @@ -902,12 +903,13 @@ ShmemPmdMapped:      0 kB
>        MemFree: The sum of LowFree+HighFree
>   MemAvailable: An estimate of how much memory is available for starting new
>                 applications, without swapping. Calculated from MemFree,
> -              SReclaimable, the size of the file LRU lists, and the low
> -              watermarks in each zone.
> +              SReclaimable, the size of the file LRU lists, LazyFree pages
> +              and the low watermarks in each zone.
>                 The estimate takes into account that the system needs some
>                 page cache to function well, and that not all reclaimable
> -              slab will be reclaimable, due to items being in use. The
> -              impact of those factors will vary from system to system.
> +              slab and LazyFree pages will be reclaimable, due to items
> +              being in use. The impact of those factors will vary from
> +              system to system.
>        Buffers: Relatively temporary storage for raw disk blocks
>                 shouldn't get tremendously large (20MB or so)
>         Cached: in-memory cache for files read from the disk (the
> @@ -945,6 +947,8 @@ AnonHugePages: Non-file backed huge pages mapped into userspace page tables
>   ShmemHugePages: Memory used by shared memory (shmem) and tmpfs allocated
>                 with huge pages
>   ShmemPmdMapped: Shared memory mapped into userspace with huge pages
> +LazyFreePages: Cleanly freeable pages under memory pressure (i.e. deferred
> +               split THP).
>   KReclaimable: Kernel allocations that the kernel will attempt to reclaim
>                 under memory pressure. Includes SReclaimable (below), and other
>                 direct allocations with a shrinker.
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 8598fcb..ef701aa 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -427,6 +427,7 @@ static ssize_t node_read_meminfo(struct device *dev,
>   		       "Node %d ShmemHugePages: %8lu kB\n"
>   		       "Node %d ShmemPmdMapped: %8lu kB\n"
>   #endif
> +		       "Node %d LazyFreePages:	%8lu kB\n"
>   			,
>   		       nid, K(node_page_state(pgdat, NR_FILE_DIRTY)),
>   		       nid, K(node_page_state(pgdat, NR_WRITEBACK)),
> @@ -453,6 +454,8 @@ static ssize_t node_read_meminfo(struct device *dev,
>   		       nid, K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED) *
>   				       HPAGE_PMD_NR)
>   #endif
> +		       ,
> +		       nid, K(node_page_state(pgdat, NR_LAZYFREE))
>   		       );
>   	n += hugetlb_report_node_meminfo(nid, buf + n);
>   	return n;
> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> index 568d90e..b02ebd0 100644
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -138,6 +138,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>   		    global_node_page_state(NR_SHMEM_PMDMAPPED) * HPAGE_PMD_NR);
>   #endif
>   
> +	show_val_kb(m, "LazyFreePages:  ",
> +		    global_node_page_state(NR_LAZYFREE));
> +
>   #ifdef CONFIG_CMA
>   	show_val_kb(m, "CmaTotal:       ", totalcma_pages);
>   	show_val_kb(m, "CmaFree:        ",
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 7799166..523ea86 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -235,6 +235,7 @@ enum node_stat_item {
>   	NR_SHMEM_THPS,
>   	NR_SHMEM_PMDMAPPED,
>   	NR_ANON_THPS,
> +	NR_LAZYFREE,		/* Lazyfree pages, i.e. deferred split THP */
>   	NR_UNSTABLE_NFS,	/* NFS unstable pages */
>   	NR_VMSCAN_WRITE,
>   	NR_VMSCAN_IMMEDIATE,	/* Prioritise for reclaim when writeback ends */
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 4f20273..78806c7 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2757,6 +2757,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>   		if (!list_empty(page_deferred_list(head))) {
>   			ds_queue->split_queue_len--;
>   			list_del(page_deferred_list(head));
> +			__mod_node_page_state(NODE_DATA(page_to_nid(head)),
> +					NR_LAZYFREE, -HPAGE_PMD_NR);
>   		}
>   		if (mapping)
>   			__dec_node_page_state(page, NR_SHMEM_THPS);
> @@ -2806,6 +2808,8 @@ void free_transhuge_page(struct page *page)
>   	if (!list_empty(page_deferred_list(page))) {
>   		ds_queue->split_queue_len--;
>   		list_del(page_deferred_list(page));
> +		__mod_node_page_state(NODE_DATA(page_to_nid(page)),
> +				NR_LAZYFREE, -HPAGE_PMD_NR);
>   	}
>   	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>   	free_compound_page(page);
> @@ -2822,6 +2826,8 @@ void deferred_split_huge_page(struct page *page)
>   	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>   	if (list_empty(page_deferred_list(page))) {
>   		count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> +		__mod_node_page_state(NODE_DATA(page_to_nid(page)),
> +				NR_LAZYFREE, HPAGE_PMD_NR);
>   		list_add_tail(page_deferred_list(page), &ds_queue->split_queue);
>   		ds_queue->split_queue_len++;
>   		if (memcg)
> @@ -2873,6 +2879,8 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>   			/* We lost race with put_compound_page() */
>   			list_del_init(page_deferred_list(page));
>   			ds_queue->split_queue_len--;
> +			__mod_node_page_state(NODE_DATA(page_to_nid(page)),
> +					NR_LAZYFREE, -HPAGE_PMD_NR);
>   		}
>   		if (!--sc->nr_to_scan)
>   			break;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7f27f4e..cab50e8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5210,6 +5210,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
>   			" shmem_pmdmapped: %lukB"
>   			" anon_thp: %lukB"
>   #endif
> +			" lazyfree:%lukB"
>   			" writeback_tmp:%lukB"
>   			" unstable:%lukB"
>   			" all_unreclaimable? %s"
> @@ -5232,6 +5233,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
>   					* HPAGE_PMD_NR),
>   			K(node_page_state(pgdat, NR_ANON_THPS) * HPAGE_PMD_NR),
>   #endif
> +			K(node_page_state(pgdat, NR_LAZYFREE)),
>   			K(node_page_state(pgdat, NR_WRITEBACK_TEMP)),
>   			K(node_page_state(pgdat, NR_UNSTABLE_NFS)),
>   			pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES ?
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index a7d4933..87703f2 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1158,6 +1158,7 @@ int fragmentation_index(struct zone *zone, unsigned int order)
>   	"nr_shmem_hugepages",
>   	"nr_shmem_pmdmapped",
>   	"nr_anon_transparent_hugepages",
> +	"nr_lazyfree",
>   	"nr_unstable",
>   	"nr_vmscan_write",
>   	"nr_vmscan_immediate_reclaim",
Powered by blists - more mailing lists
 
