[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200326062451.GA110624@google.com>
Date: Wed, 25 Mar 2020 23:24:51 -0700
From: Minchan Kim <minchan@...nel.org>
To: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: linux-kernel@...r.kernel.org, mhocko@...e.com, jannh@...gle.com,
vbabka@...e.cz, dancol@...gle.com, joel@...lfernandes.org,
akpm@...ux-foundation.org
Subject: Re: [PATCH 1/2] mm/madvise: help MADV_PAGEOUT to find swap cache
pages
On Mon, Mar 23, 2020 at 04:41:49PM -0700, Dave Hansen wrote:
>
> From: Dave Hansen <dave.hansen@...ux.intel.com>
>
> tl;dr: MADV_PAGEOUT ignores unmapped swap cache pages. Enable
> MADV_PAGEOUT to find and reclaim swap cache.
>
> The long story:
>
> Looking for another issue, I wrote a simple test which had two
> processes: a parent and a fork()'d child. The parent reads a
> memory buffer shared by the fork() and the child calls
> madvise(MADV_PAGEOUT) on the same buffer.
>
> The first call to MADV_PAGEOUT does what is expected: it pages
> the memory out and causes faults in the parent. However, after
> that, it does not cause any faults in the parent. MADV_PAGEOUT
> only works once! This was a surprise.
>
> The PTEs in the shared buffer start out pte_present()==1 in
> both parent and child. The first MADV_PAGEOUT operation replaces
> those with pte_present()==0 swap PTEs. The parent process
> quickly faults and recreates pte_present()==1. However, the
> child process (the one calling MADV_PAGEOUT) never touches the
> memory and has retained the non-present swap PTEs.
>
> This situation could also happen in the case where a single
> process had some of its data placed in the swap cache but where
> the memory has not yet been reclaimed.
>
> The MADV_PAGEOUT code has a pte_present()==0 check. It will
> essentially ignore any pte_present()==0 pages. This essentially
> makes unmapped swap cache immune from MADV_PAGEOUT, which is not
> very friendly behavior.
>
> Enable MADV_PAGEOUT to find and reclaim swap cache. Because
> swap cache is not pinned by holding the PTE lock, a reference
> must be held until the page is isolated, where a second
> reference is obtained.
>
> Signed-off-by: Dave Hansen <dave.hansen@...ux.intel.com>
Acked-by: Minchan Kim <minchan@...nel.org>
Powered by blists - more mailing lists