lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 14 Feb 2020 11:29:50 -0800
From:   Minchan Kim <minchan@...nel.org>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     linux-mm <linux-mm@...ck.org>, LKML <linux-kernel@...r.kernel.org>,
        Jan Kara <jack@...e.cz>, Matthew Wilcox <willy@...radead.org>,
        Josef Bacik <josef@...icpanda.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Minchan Kim <minchan@...nel.org>
Subject: [PATCH v2 1/2] mm: make PageReadahead more strict

Recently, I got some bugreports major page fault takes several seconds
sometime. When I review drop mmap_sem logic, I found several bugs.

   CPU 1                                                        CPU 2
mm_populate
 for ()
   ..
   ret = populate_vma_page_range
     __get_user_pages
       faultin_page
         handle_mm_fault
           filemap_fault
             do_async_mmap_readahead
                                                        shrink_page_list
                                                          pageout
                                                            SetPageReclaim(=SetPageReadahead)
                                                              writepage
                                                                SetPageWriteback
               if (PageReadahead(page))
                 maybe_unlock_mmap_for_io
                   up_read(mmap_sem)
                 page_cache_async_readahead()
                   if (PageWriteback(page))
                     return;

Here, since ret from populate_vma_page_range is zero, the loop continue
to run with same address with previous iteration. It will repeat the
loop until the page's writeout is done(ie, PG_writeback or PG_reclaim
is clear).

We could fix the above specific case via adding PageWriteback

   ret = populate_vma_page_range
           ...
           ...
           filemap_fault
             do_async_mmap_readahead
               if (!PageWriteback(page) && PageReadahead(page))
                 maybe_unlock_mmap_for_io
                   up_read(mmap_sem)
                 page_cache_async_readahead()
                   if (PageWriteback(page))
                     return;

Furthermore, to prevent potential issues caused by sharing PG_readahead
with PG_reclaim, let's make page flag wrapper for PageReadahead
with description. With that, we could remove PageWriteback check
in page_cache_async_readahead, which is more clear for maintenance/
readability.

Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations")
Signed-off-by: Minchan Kim <minchan@...nel.org>
---
 include/linux/page-flags.h | 28 ++++++++++++++++++++++++++--
 mm/readahead.c             |  6 ------
 2 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 1bf83c8fcaa7..f91a9b2a49bd 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -363,8 +363,32 @@ PAGEFLAG(MappedToDisk, mappedtodisk, PF_NO_TAIL)
 /* PG_readahead is only used for reads; PG_reclaim is only for writes */
 PAGEFLAG(Reclaim, reclaim, PF_NO_TAIL)
 	TESTCLEARFLAG(Reclaim, reclaim, PF_NO_TAIL)
-PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
-	TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
+
+SETPAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
+CLEARPAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
+
+/*
+ * Since PG_readahead is shared with PG_reclaim of the page flags,
+ * PageReadahead should double check whether it's readahead marker
+ * or PG_reclaim. It could be done by PageWriteback check because
+ * PG_reclaim is always with PG_writeback.
+ */
+static inline int PageReadahead(struct page *page)
+{
+	VM_BUG_ON_PGFLAGS(PageCompound(page), page);
+
+	return (page->flags & (1UL << PG_reclaim | 1UL << PG_writeback)) ==
+		(1UL << PG_reclaim);
+}
+
+/* Clear PG_readahead only if it's PG_readahead, not PG_reclaim */
+static inline int TestClearPageReadahead(struct page *page)
+{
+	VM_BUG_ON_PGFLAGS(PageCompound(page), page);
+
+	return !PageWriteback(page) ||
+			test_and_clear_bit(PG_reclaim, &page->flags);
+}
 
 #ifdef CONFIG_HIGHMEM
 /*
diff --git a/mm/readahead.c b/mm/readahead.c
index 2fe72cd29b47..85b15e5a1d7b 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -553,12 +553,6 @@ page_cache_async_readahead(struct address_space *mapping,
 	if (!ra->ra_pages)
 		return;
 
-	/*
-	 * Same bit is used for PG_readahead and PG_reclaim.
-	 */
-	if (PageWriteback(page))
-		return;
-
 	ClearPageReadahead(page);
 
 	/*
-- 
2.25.0.265.gbab2e86ba0-goog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ