[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <632ba212e32bb_349629451@dwillia2-xfh.jf.intel.com.notmuch>
Date: Wed, 21 Sep 2022 16:45:22 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Jason Gunthorpe <jgg@...dia.com>,
Dan Williams <dan.j.williams@...el.com>,
Alistair Popple <apopple@...dia.com>
CC: <akpm@...ux-foundation.org>, Matthew Wilcox <willy@...radead.org>,
"Jan Kara" <jack@...e.cz>, "Darrick J. Wong" <djwong@...nel.org>,
"Christoph Hellwig" <hch@....de>,
John Hubbard <jhubbard@...dia.com>,
<linux-fsdevel@...r.kernel.org>, <nvdimm@...ts.linux.dev>,
<linux-xfs@...r.kernel.org>, <linux-mm@...ck.org>,
<linux-ext4@...r.kernel.org>
Subject: Re: [PATCH v2 16/18] mm/memremap_pages: Support initializing pages
to a zero reference count
Jason Gunthorpe wrote:
> On Thu, Sep 15, 2022 at 08:36:43PM -0700, Dan Williams wrote:
> > The initial memremap_pages() implementation inherited the
> > __init_single_page() default of pages starting life with an elevated
> > reference count. This originally allowed for the page->pgmap pointer to
> > alias with the storage for page->lru since a page was only allowed to be
> > on an lru list when its reference count was zero.
> >
> > Since then, 'struct page' definition cleanups have arranged for
> > dedicated space for the ZONE_DEVICE page metadata, and the
> > MEMORY_DEVICE_{PRIVATE,COHERENT} work has arranged for the 1 -> 0
> > page->_refcount transition to route the page to free_zone_device_page()
> > and not the core-mm page-free. With those cleanups in place and with
> > filesystem-dax and device-dax now converted to take and drop references
> > at map and truncate time, it is possible to start MEMORY_DEVICE_FS_DAX
> > and MEMORY_DEVICE_GENERIC reference counts at 0.
> >
> > MEMORY_DEVICE_{PRIVATE,COHERENT} still expect that their ZONE_DEVICE
> > pages start life at _refcount 1, so make that the default if
> > pgmap->init_mode is left at zero.
>
> I'm shocked to read this - how does it make any sense?
I think what happened is that since memremap_pages() historically
produced pages with an elevated reference count that GPU drivers skipped
taking a reference on first allocation and just passed along an elevated
reference count page to the first user.
So either we keep that assumption or update all users to be prepared for
idle pages coming out of memremap_pages().
This is all in reaction to the "set_page_count(page, 1);" in
free_zone_device_page(). Which I am happy to get rid of but need from
help from MEMORY_DEVICE_{PRIVATE,COHERENT} folks to react to
memremap_pages() starting all pages at reference count 0.
> dev_pagemap_ops->page_free() is only called on the 1->0 transition, so
> any driver which implements it must be expecting pages to have a 0
> refcount.
>
> Looking around everything but only fsdax_pagemap_ops implements
> page_free()
Right.
> So, how does it work? Surely the instant the page map is created all
> the pages must be considered 'free', and after page_free() is called I
> would also expect the page to be considered free.
The GPU drivers need to increment reference counts when they hand out
the page rather than reuse the reference count that they get by default.
> How on earth can a free'd page have both a 0 and 1 refcount??
This is residual wonkiness from memremap_pages() handing out pages with
elevated reference counts at the outset.
> eg look at the simple hmm_test, it threads pages on to the
> mdevice->free_pages list immediately after memremap_pages and then
> again inside page_free() - it is completely wrong that they would have
> different refcounts while on the free_pages list.
I do not see any page_ref_inc() in that test, only put_page() so it is
assuming non-idle pages at the outset.
> I would expect that after the page is removed from the free_pages list
> it will have its recount set to 1 to make it non-free then it will go
> through the migration.
>
> Alistair how should the refcounting be working here in hmm_test?
>
> Jason
Powered by blists - more mailing lists