[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200214192951.29430-2-minchan@kernel.org>
Date: Fri, 14 Feb 2020 11:29:51 -0800
From: Minchan Kim <minchan@...nel.org>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-mm <linux-mm@...ck.org>, LKML <linux-kernel@...r.kernel.org>,
Jan Kara <jack@...e.cz>, Matthew Wilcox <willy@...radead.org>,
Josef Bacik <josef@...icpanda.com>,
Johannes Weiner <hannes@...xchg.org>,
Minchan Kim <minchan@...nel.org>
Subject: [PATCH v2 2/2] mm: fix long time stall from mm_populate
Basically, fault handler releases mmap_sem before requesting readahead
and then it is supposed to retry lookup the page from page cache with
FAULT_FLAG_TRIED so that it avoids the live lock of infinite retry.
However, what happens if the fault handler find a page from page
cache and the page has readahead marker but are waiting under
writeback? Plus one more condition, it happens under mm_populate
which repeats faulting unless it encounters error. So let's assemble
conditions below.
CPU 1 CPU 2
- first loop
mm_populate
for ()
..
ret = populate_vma_page_range
__get_user_pages
faultin_page
handle_mm_fault
filemap_fault
do_async_mmap_readahead
if (PageReadahead(pageA))
maybe_unlock_mmap_for_io
up_read(mmap_sem)
shrink_page_list
pageout
SetPageReclaim(=SetPageReadahead)(pageA)
writepage
SetPageWriteback(pageA)
page_cache_async_readahead()
ClearPageReadahead(pageA)
do_async_mmap_readahead
lock_page_maybe_drop_mmap
goto out_retry
the pageA is reclaimed
and new pageB is populated to the file offset
and finally has become PG_readahead
- second loop
__get_user_pages
faultin_page
handle_mm_fault
filemap_fault
do_async_mmap_readahead
if (PageReadahead(pageB))
maybe_unlock_mmap_for_io
up_read(mmap_sem)
shrink_page_list
pageout
SetPageReclaim(=SetPageReadahead)(pageB)
writepage
SetPageWriteback(pageB)
page_cache_async_readahead()
ClearPageReadahead(pageB)
do_async_mmap_readahead
lock_page_maybe_drop_mmap
goto out_retry
It could be repeated forever so it's livelock. without involving reclaim,
it could happens if ra_pages become zero by fadvise/other threads who
have same fd one doing randome while the other one is sequential
because page_cache_async_readahead has following condition check like
PageWriteback and ra_pages are never synchrnized with fadvise and
shrink_readahead_size_eio from other threads.
void page_cache_async_readahead(struct address_space *mapping,
unsigned long req_size)
{
/* no read-ahead */
if (!ra->ra_pages)
return;
Thus, we need to limit fault retry from mm_populate like page
fault handler.
Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations")
Reviewed-by: Jan Kara <jack@...e.cz>
Signed-off-by: Minchan Kim <minchan@...nel.org>
---
mm/gup.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 1b521e0ac1de..6f6548c63ad5 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1133,7 +1133,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk,
*
* This takes care of mlocking the pages too if VM_LOCKED is set.
*
- * return 0 on success, negative error code on error.
+ * return number of pages pinned on success, negative error code on error.
*
* vma->vm_mm->mmap_sem must be held.
*
@@ -1196,6 +1196,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
struct vm_area_struct *vma = NULL;
int locked = 0;
long ret = 0;
+ bool tried = false;
end = start + len;
@@ -1226,14 +1227,18 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
* double checks the vma flags, so that it won't mlock pages
* if the vma was already munlocked.
*/
- ret = populate_vma_page_range(vma, nstart, nend, &locked);
+ ret = populate_vma_page_range(vma, nstart, nend,
+ tried ? NULL : &locked);
if (ret < 0) {
if (ignore_errors) {
ret = 0;
continue; /* continue at next VMA */
}
break;
- }
+ } else if (ret == 0)
+ tried = true;
+ else
+ tried = false;
nend = nstart + ret * PAGE_SIZE;
ret = 0;
}
--
2.25.0.265.gbab2e86ba0-goog
Powered by blists - more mailing lists