lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 18 Apr 2020 08:45:59 +0900
From:   Jaewon Kim <jaewon31.kim@...sung.com>
To:     Minchan Kim <minchan@...nel.org>,
        Johannes Weiner <hannes@...xchg.org>
Cc:     mgorman@...e.de, m.szyprowski@...sung.com, mina86@...a86.com,
        riel@...hat.com, akpm@...ux-foundation.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, jaewon31.kim@...il.com,
        ytk.lee@...sung.com
Subject: Re: [PATCH] mm/vmscan: skip layzfree page on
 reclaim_clean_pages_from_list



On 2020년 04월 18일 00:13, Minchan Kim wrote:
> On Thu, Apr 16, 2020 at 05:38:37PM -0700, Minchan Kim wrote:
>> Hi Jaewon,
>>
>> On Thu, Apr 16, 2020 at 12:35:14PM +0900, Jaewon Kim wrote:
>>> This patch fix nr_isolate_* mismatch problem between cma and dirty
>>> lazyfree page.
>>>
>>> If try_to_unmap_one is used for reclaim and it detects a dirty lazyfree
>>> page, then the lazyfree page is changed to a normal anon page having
>>> SwapBacked by commit 18863d3a3f59 ("mm: remove SWAP_DIRTY in ttu"). Even
>>> with the change, reclaim context correctly counts isolated files because
>>> it uses is_file_lru to distinguish file. And the change to anon is not
>>> happened if try_to_unmap_one is used for migration. So migration context
>>> like compaction also correctly counts isolated files even though it uses
>>> page_is_file_lru insted of is_file_lru. Recently page_is_file_cache was
>>> renamed to page_is_file_lru by commit 9de4f22a60f7 ("mm: code cleanup for
>>> MADV_FREE").
>>>
>>> But the nr_isolate_* mismatch problem happens on cma alloc. There is
>>> reclaim_clean_pages_from_list which is being used only by cma. It was
>>> introduced by commit 02c6de8d757c ("mm: cma: discard clean pages during
>>> contiguous allocation instead of migration") to reclaim clean file pages
>>> without migration. The cma alloc uses both reclaim_clean_pages_from_list
>>> and migrate_pages, and it uses page_is_file_lru to count isolated
>>> files. If there are dirty lazyfree pages allocated from cma memory
>>> region, the pages are counted as isolated file at the beginging but are
>>> counted as isolated anon after finished.
>>>
>>> Mem-Info:
>>> Node 0 active_anon:3045904kB inactive_anon:611448kB active_file:14892kB inactive_file:205636kB unevictable:10416kB isolated(anon):0kB isolated(file):37664kB mapped:630216kB dirty:384kB writeback:0kB shmem:42576kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
>>>
>>> Like log above, there was too much isolated file, 37664kB, which
>>> triggers too_many_isolated in reclaim when there is no isolated file in
>>> system wide. It could be reproducible by running two programs, doing
>>> MADV_FREE, writing and doing cma alloc, respectively. Although isolated
>>> anon is 0, I found that the internal value of isolated anon was the
>>> negative value of isolated file.
>>>
>>> Fix this by skipping anon pages on reclaim_clean_pages_from_list. The
>>> lazyfree page can be checked by both PageAnon(page) and
>>> page_is_file_lru. But in this case, PageAnon is enough to skip all
>>> anon pages.
>>>
>>> Reported-by: Yong-Taek Lee <ytk.lee@...sung.com>
>>> Signed-off-by: Jaewon Kim <jaewon31.kim@...sung.com>
>> Thanks for the investigation!
>> The thing is MADV_FREEed page since supporting swapless could change
>> his LRU status between reclaim.
>>
>> I am worry about voiding the optimization we have kept in CMA but
>> also don't have good idea, either so I tend to agree with this.
>>
>> Let me Cc Johannes who might have better idea.
>>
>>> ---
>>>  mm/vmscan.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index b06868fc4926..9380a18eef5e 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -1497,6 +1497,9 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>>>  	LIST_HEAD(clean_pages);
>>>  
>>>  	list_for_each_entry_safe(page, next, page_list, lru) {
>>> +		/* to avoid race with MADV_FREE anon page */
>>> +		if (PageAnon(page))
>>> +			continue;
>>>  		if (page_is_file_lru(page) && !PageDirty(page) &&
>>>  		    !__PageMovable(page) && !PageUnevictable(page)) {
>>>  			ClearPageActive(page);
>>> -- 
>>> 2.13.7
>>>
> Hi Jaewon,
>
> How about this idea? I think it could solve the issue with keeping
> CMA alloc latency optimization.
Hello Minchan

It looks good to me except compilation error.

And to apply this patch on other stable branches, we may need some other
dependent patches though.

Thank you
>
> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> index 292485f3d24d..10cc932e209a 100644
> --- a/include/linux/vmstat.h
> +++ b/include/linux/vmstat.h
> @@ -29,6 +29,7 @@ struct reclaim_stat {
>  	unsigned nr_activate[2];
>  	unsigned nr_ref_keep;
>  	unsigned nr_unmap_fail;
> +	unsigned nr_lazyfree_fail;
>  };
>  
>  enum writeback_stat_item {
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4c8a1cdccbba..b390f6094f2f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1296,11 +1296,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		 */
>  		if (page_mapped(page)) {
>  			enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH;
> +			bool lazyfree = PageAnon(page) && !PageSwapBacked(page);
>  
>  			if (unlikely(PageTransHuge(page)))
>  				flags |= TTU_SPLIT_HUGE_PMD;
> +
>  			if (!try_to_unmap(page, flags)) {
>  				stat->nr_unmap_fail += nr_pages;
> +				if (lazyfree && PageSwapBacked(page))
> +					stat->nr_lazyfree_fail += nr_pages;
>  				goto activate_locked;
>  			}
>  		}
> @@ -1492,8 +1496,8 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>  		.priority = DEF_PRIORITY,
>  		.may_unmap = 1,
>  	};
> -	struct reclaim_stat dummy_stat;
> -	unsigned long ret;
> +	struct reclaim_stat stat;
> +	unsigned long reclaimed;
>  	struct page *page, *next;
>  	LIST_HEAD(clean_pages);
>  
> @@ -1505,11 +1509,21 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>  		}
>  	}
>  
> -	ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
> -			TTU_IGNORE_ACCESS, &dummy_stat, true);
> +	reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
> +			TTU_IGNORE_ACCESS, &stat, true);
>  	list_splice(&clean_pages, page_list);
> -	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret);
> -	return ret;
> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -reclaimed);
> +	/*
> +	 * Since lazyfree pages are isolated from file LRU from the beginning,
> +	 * they will rotate back to anonymous LRU in the end if it failed to
> +	 * discard so isolated count will be mismatched.
> +	 * Compensate the isolated count for both LRU lists.
> +	 */
> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_ANON,
> +					stat->nr_lazyfree_fail);
> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE,
> +					-stat->nr_lazyfree_fail);
should be stat.nr_lazyfree_fail and -stat.nr_lazyfree_fail instead of ->
> +	return reclaimed;
>  }
>  
>  /*

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ