linux-ext4 - Re: [PATCH v2 07/19] mm/migrate: Convert expected_page_refs() to folio_expected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6e7599d1-8a5f-bf16-383c-febd753bd051@google.com>
Date:   Thu, 7 Jul 2022 19:50:17 -0700 (PDT)
From:   Hugh Dickins <hughd@...gle.com>
To:     "Matthew Wilcox (Oracle)" <willy@...radead.org>
cc:     Andrew Morton <akpm@...ux-foundation.org>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-block@...r.kernel.org, linux-aio@...ck.org,
        linux-btrfs@...r.kernel.org, linux-ext4@...r.kernel.org,
        linux-f2fs-devel@...ts.sourceforge.net, cluster-devel@...hat.com,
        linux-mm@...ck.org, linux-xfs@...r.kernel.org,
        linux-nfs@...r.kernel.org, linux-ntfs-dev@...ts.sourceforge.net,
        ocfs2-devel@....oracle.com, linux-mtd@...ts.infradead.org,
        virtualization@...ts.linux-foundation.org,
        Christoph Hellwig <hch@....de>
Subject: Re: [PATCH v2 07/19] mm/migrate: Convert expected_page_refs() to
 folio_expected_refs()

On Wed, 8 Jun 2022, Matthew Wilcox (Oracle) wrote:

> Now that both callers have a folio, convert this function to
> take a folio & rename it.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@...radead.org>
> Reviewed-by: Christoph Hellwig <hch@....de>
> ---
>  mm/migrate.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 2975f0c4d7cf..2e2f41572066 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -336,13 +336,18 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
>  }
>  #endif
>  
> -static int expected_page_refs(struct address_space *mapping, struct page *page)
> +static int folio_expected_refs(struct address_space *mapping,
> +		struct folio *folio)
>  {
> -	int expected_count = 1;
> +	int refs = 1;
> +	if (!mapping)
> +		return refs;
>  
> -	if (mapping)
> -		expected_count += compound_nr(page) + page_has_private(page);
> -	return expected_count;
> +	refs += folio_nr_pages(folio);
> +	if (folio_get_private(folio))
> +		refs++;
> +
> +	return refs;
>  }
>  
>  /*
> @@ -359,7 +364,7 @@ int folio_migrate_mapping(struct address_space *mapping,
>  	XA_STATE(xas, &mapping->i_pages, folio_index(folio));
>  	struct zone *oldzone, *newzone;
>  	int dirty;
> -	int expected_count = expected_page_refs(mapping, &folio->page) + extra_count;
> +	int expected_count = folio_expected_refs(mapping, folio) + extra_count;
>  	long nr = folio_nr_pages(folio);
>  
>  	if (!mapping) {
> @@ -669,7 +674,7 @@ static int __buffer_migrate_folio(struct address_space *mapping,
>  		return migrate_page(mapping, &dst->page, &src->page, mode);
>  
>  	/* Check whether page does not have extra refs before we do more work */
> -	expected_count = expected_page_refs(mapping, &src->page);
> +	expected_count = folio_expected_refs(mapping, src);
>  	if (folio_ref_count(src) != expected_count)
>  		return -EAGAIN;
>  
> -- 
> 2.35.1

This commit (742e89c9e352d38df1a5825fe40c4de73a5d5f7a in pagecache.git
folio/for-next and recent linux-next) is dangerously wrong, at least
for swapcache, and probably for some others.

I say "dangerously" because it tells page migration a swapcache page
is safe for migration when it certainly is not.

The fun that typically ensues is kernel BUG at include/linux/mm.h:750!
put_page_testzero() VM_BUG_ON_PAGE(page_ref_count(page) == 0, page),
if CONFIG_DEBUG_VM=y (bisecting for that is what brought me to this).
But I guess you might get silent data corruption too.

I assumed at first that you'd changed the rules, and were now expecting
any subsystem that puts a non-zero value into folio->private to raise
its refcount - whereas the old convention (originating with buffer heads)
is that setting PG_private says an extra refcount has been taken, please
call try_to_release_page() to lower it, and maybe that will use data in
page->private to do so; but page->private free for the subsystem owning
the page to use as it wishes, no refcount implication.  But that you
had missed updating swapcache.

So I got working okay with the patch below; but before turning it into
a proper patch, noticed that there were still plenty of other places
applying the test for PG_private: so now think that maybe you set out
with intention as above, realized it wouldn't work, but got distracted
before cleaning up some places you'd already changed.  And patch below
now goes in the wrong direction.

Or maybe you didn't intend any change, but the PG_private test just got
missed in a few places.  I don't know, hope you remember, but current
linux-next badly inconsistent.
Over to you, thanks,

Hugh

--- a/mm/migrate.c	2022-07-06 14:24:44.499941975 -0700
+++ b/mm/migrate.c	2022-07-06 15:49:25.000000000 -0700
@@ -351,6 +351,10 @@ unlock:
 }
 #endif
 
+static inline bool folio_counted_private(struct folio *folio)
+{
+	return !folio_test_swapcache(folio) && folio_get_private(folio);
+}
 static int folio_expected_refs(struct address_space *mapping,
 		struct folio *folio)
 {
@@ -359,7 +363,7 @@ static int folio_expected_refs(struct ad
 		return refs;
 
 	refs += folio_nr_pages(folio);
-	if (folio_get_private(folio))
+	if (folio_counted_private(folio))
 		refs++;
 
 	return refs;
--- a/mm/vmscan.c	2022-07-06 14:24:44.531942217 -0700
+++ b/mm/vmscan.c	2022-07-06 15:49:37.000000000 -0700
@@ -2494,6 +2494,10 @@ shrink_inactive_list(unsigned long nr_to
  * The downside is that we have to touch folio->_refcount against each folio.
  * But we had to alter folio->flags anyway.
  */
+static inline bool folio_counted_private(struct folio *folio)
+{
+	return !folio_test_swapcache(folio) && folio_get_private(folio);
+}
 static void shrink_active_list(unsigned long nr_to_scan,
 			       struct lruvec *lruvec,
 			       struct scan_control *sc,
@@ -2538,8 +2542,9 @@ static void shrink_active_list(unsigned
 		}
 
 		if (unlikely(buffer_heads_over_limit)) {
-			if (folio_get_private(folio) && folio_trylock(folio)) {
-				if (folio_get_private(folio))
+			if (folio_counted_private(folio) &&
+			    folio_trylock(folio)) {
+				if (folio_counted_private(folio))
 					filemap_release_folio(folio, 0);
 				folio_unlock(folio);
 			}