[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Ysekikp60Hhs5lV0@casper.infradead.org>
Date: Fri, 8 Jul 2022 04:29:14 +0100
From: Matthew Wilcox <willy@...radead.org>
To: Hugh Dickins <hughd@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-block@...r.kernel.org, linux-aio@...ck.org,
linux-btrfs@...r.kernel.org, linux-ext4@...r.kernel.org,
linux-f2fs-devel@...ts.sourceforge.net, cluster-devel@...hat.com,
linux-mm@...ck.org, linux-xfs@...r.kernel.org,
linux-nfs@...r.kernel.org, linux-ntfs-dev@...ts.sourceforge.net,
ocfs2-devel@....oracle.com, linux-mtd@...ts.infradead.org,
virtualization@...ts.linux-foundation.org,
Christoph Hellwig <hch@....de>
Subject: Re: [PATCH v2 07/19] mm/migrate: Convert expected_page_refs() to
folio_expected_refs()
On Thu, Jul 07, 2022 at 07:50:17PM -0700, Hugh Dickins wrote:
> On Wed, 8 Jun 2022, Matthew Wilcox (Oracle) wrote:
>
> > Now that both callers have a folio, convert this function to
> > take a folio & rename it.
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@...radead.org>
> > Reviewed-by: Christoph Hellwig <hch@....de>
> > ---
> > mm/migrate.c | 19 ++++++++++++-------
> > 1 file changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 2975f0c4d7cf..2e2f41572066 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -336,13 +336,18 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
> > }
> > #endif
> >
> > -static int expected_page_refs(struct address_space *mapping, struct page *page)
> > +static int folio_expected_refs(struct address_space *mapping,
> > + struct folio *folio)
> > {
> > - int expected_count = 1;
> > + int refs = 1;
> > + if (!mapping)
> > + return refs;
> >
> > - if (mapping)
> > - expected_count += compound_nr(page) + page_has_private(page);
> > - return expected_count;
> > + refs += folio_nr_pages(folio);
> > + if (folio_get_private(folio))
> > + refs++;
> > +
> > + return refs;
> > }
> >
> > /*
> > @@ -359,7 +364,7 @@ int folio_migrate_mapping(struct address_space *mapping,
> > XA_STATE(xas, &mapping->i_pages, folio_index(folio));
> > struct zone *oldzone, *newzone;
> > int dirty;
> > - int expected_count = expected_page_refs(mapping, &folio->page) + extra_count;
> > + int expected_count = folio_expected_refs(mapping, folio) + extra_count;
> > long nr = folio_nr_pages(folio);
> >
> > if (!mapping) {
> > @@ -669,7 +674,7 @@ static int __buffer_migrate_folio(struct address_space *mapping,
> > return migrate_page(mapping, &dst->page, &src->page, mode);
> >
> > /* Check whether page does not have extra refs before we do more work */
> > - expected_count = expected_page_refs(mapping, &src->page);
> > + expected_count = folio_expected_refs(mapping, src);
> > if (folio_ref_count(src) != expected_count)
> > return -EAGAIN;
> >
> > --
> > 2.35.1
>
> This commit (742e89c9e352d38df1a5825fe40c4de73a5d5f7a in pagecache.git
> folio/for-next and recent linux-next) is dangerously wrong, at least
> for swapcache, and probably for some others.
>
> I say "dangerously" because it tells page migration a swapcache page
> is safe for migration when it certainly is not.
>
> The fun that typically ensues is kernel BUG at include/linux/mm.h:750!
> put_page_testzero() VM_BUG_ON_PAGE(page_ref_count(page) == 0, page),
> if CONFIG_DEBUG_VM=y (bisecting for that is what brought me to this).
> But I guess you might get silent data corruption too.
>
> I assumed at first that you'd changed the rules, and were now expecting
> any subsystem that puts a non-zero value into folio->private to raise
> its refcount - whereas the old convention (originating with buffer heads)
> is that setting PG_private says an extra refcount has been taken, please
> call try_to_release_page() to lower it, and maybe that will use data in
> page->private to do so; but page->private free for the subsystem owning
> the page to use as it wishes, no refcount implication. But that you
> had missed updating swapcache.
>
> So I got working okay with the patch below; but before turning it into
> a proper patch, noticed that there were still plenty of other places
> applying the test for PG_private: so now think that maybe you set out
> with intention as above, realized it wouldn't work, but got distracted
> before cleaning up some places you'd already changed. And patch below
> now goes in the wrong direction.
>
> Or maybe you didn't intend any change, but the PG_private test just got
> missed in a few places. I don't know, hope you remember, but current
> linux-next badly inconsistent.
> Over to you, thanks,
Ugh. The problem I'm trying to solve is that we're short on page flags.
We _seemed_ to have correlation between "page->private != NULL" and
"PG_private is set", and so I thought I could make progress towards
removing PG_private. But the rule you set out above wasn't written down
anywhere that I was able to find it.
I'm about to go to sleep, but I'll think on this some more tomorrow.
> Hugh
>
> --- a/mm/migrate.c 2022-07-06 14:24:44.499941975 -0700
> +++ b/mm/migrate.c 2022-07-06 15:49:25.000000000 -0700
> @@ -351,6 +351,10 @@ unlock:
> }
> #endif
>
> +static inline bool folio_counted_private(struct folio *folio)
> +{
> + return !folio_test_swapcache(folio) && folio_get_private(folio);
> +}
> static int folio_expected_refs(struct address_space *mapping,
> struct folio *folio)
> {
> @@ -359,7 +363,7 @@ static int folio_expected_refs(struct ad
> return refs;
>
> refs += folio_nr_pages(folio);
> - if (folio_get_private(folio))
> + if (folio_counted_private(folio))
> refs++;
>
> return refs;
> --- a/mm/vmscan.c 2022-07-06 14:24:44.531942217 -0700
> +++ b/mm/vmscan.c 2022-07-06 15:49:37.000000000 -0700
> @@ -2494,6 +2494,10 @@ shrink_inactive_list(unsigned long nr_to
> * The downside is that we have to touch folio->_refcount against each folio.
> * But we had to alter folio->flags anyway.
> */
> +static inline bool folio_counted_private(struct folio *folio)
> +{
> + return !folio_test_swapcache(folio) && folio_get_private(folio);
> +}
> static void shrink_active_list(unsigned long nr_to_scan,
> struct lruvec *lruvec,
> struct scan_control *sc,
> @@ -2538,8 +2542,9 @@ static void shrink_active_list(unsigned
> }
>
> if (unlikely(buffer_heads_over_limit)) {
> - if (folio_get_private(folio) && folio_trylock(folio)) {
> - if (folio_get_private(folio))
> + if (folio_counted_private(folio) &&
> + folio_trylock(folio)) {
> + if (folio_counted_private(folio))
> filemap_release_folio(folio, 0);
> folio_unlock(folio);
> }
Powered by blists - more mailing lists