[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YyssywF6HmZrfqhD@nvidia.com>
Date: Wed, 21 Sep 2022 12:24:59 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Dan Williams <dan.j.williams@...el.com>,
Alistair Popple <apopple@...dia.com>
Cc: akpm@...ux-foundation.org, Matthew Wilcox <willy@...radead.org>,
Jan Kara <jack@...e.cz>, "Darrick J. Wong" <djwong@...nel.org>,
Christoph Hellwig <hch@....de>,
John Hubbard <jhubbard@...dia.com>,
linux-fsdevel@...r.kernel.org, nvdimm@...ts.linux.dev,
linux-xfs@...r.kernel.org, linux-mm@...ck.org,
linux-ext4@...r.kernel.org
Subject: Re: [PATCH v2 16/18] mm/memremap_pages: Support initializing pages
to a zero reference count
On Thu, Sep 15, 2022 at 08:36:43PM -0700, Dan Williams wrote:
> The initial memremap_pages() implementation inherited the
> __init_single_page() default of pages starting life with an elevated
> reference count. This originally allowed for the page->pgmap pointer to
> alias with the storage for page->lru since a page was only allowed to be
> on an lru list when its reference count was zero.
>
> Since then, 'struct page' definition cleanups have arranged for
> dedicated space for the ZONE_DEVICE page metadata, and the
> MEMORY_DEVICE_{PRIVATE,COHERENT} work has arranged for the 1 -> 0
> page->_refcount transition to route the page to free_zone_device_page()
> and not the core-mm page-free. With those cleanups in place and with
> filesystem-dax and device-dax now converted to take and drop references
> at map and truncate time, it is possible to start MEMORY_DEVICE_FS_DAX
> and MEMORY_DEVICE_GENERIC reference counts at 0.
>
> MEMORY_DEVICE_{PRIVATE,COHERENT} still expect that their ZONE_DEVICE
> pages start life at _refcount 1, so make that the default if
> pgmap->init_mode is left at zero.
I'm shocked to read this - how does it make any sense?
dev_pagemap_ops->page_free() is only called on the 1->0 transition, so
any driver which implements it must be expecting pages to have a 0
refcount.
Looking around everything but only fsdax_pagemap_ops implements
page_free()
So, how does it work? Surely the instant the page map is created all
the pages must be considered 'free', and after page_free() is called I
would also expect the page to be considered free.
How on earth can a free'd page have both a 0 and 1 refcount??
eg look at the simple hmm_test, it threads pages on to the
mdevice->free_pages list immediately after memremap_pages and then
again inside page_free() - it is completely wrong that they would have
different refcounts while on the free_pages list.
I would expect that after the page is removed from the free_pages list
it will have its recount set to 1 to make it non-free then it will go
through the migration.
Alistair how should the refcounting be working here in hmm_test?
Jason
Powered by blists - more mailing lists