linux-ext4 - Re: [PATCH v2 16/18] mm/memremap_pages: Support initializing pages to a zero reference count

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Yyuml1tSKPmvLS6P@nvidia.com>
Date:   Wed, 21 Sep 2022 21:04:39 -0300
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Dan Williams <dan.j.williams@...el.com>
Cc:     Alistair Popple <apopple@...dia.com>, akpm@...ux-foundation.org,
        Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>,
        "Darrick J. Wong" <djwong@...nel.org>,
        Christoph Hellwig <hch@....de>,
        John Hubbard <jhubbard@...dia.com>,
        linux-fsdevel@...r.kernel.org, nvdimm@...ts.linux.dev,
        linux-xfs@...r.kernel.org, linux-mm@...ck.org,
        linux-ext4@...r.kernel.org
Subject: Re: [PATCH v2 16/18] mm/memremap_pages: Support initializing pages
 to a zero reference count

On Wed, Sep 21, 2022 at 04:45:22PM -0700, Dan Williams wrote:
> Jason Gunthorpe wrote:
> > On Thu, Sep 15, 2022 at 08:36:43PM -0700, Dan Williams wrote:
> > > The initial memremap_pages() implementation inherited the
> > > __init_single_page() default of pages starting life with an elevated
> > > reference count. This originally allowed for the page->pgmap pointer to
> > > alias with the storage for page->lru since a page was only allowed to be
> > > on an lru list when its reference count was zero.
> > > 
> > > Since then, 'struct page' definition cleanups have arranged for
> > > dedicated space for the ZONE_DEVICE page metadata, and the
> > > MEMORY_DEVICE_{PRIVATE,COHERENT} work has arranged for the 1 -> 0
> > > page->_refcount transition to route the page to free_zone_device_page()
> > > and not the core-mm page-free. With those cleanups in place and with
> > > filesystem-dax and device-dax now converted to take and drop references
> > > at map and truncate time, it is possible to start MEMORY_DEVICE_FS_DAX
> > > and MEMORY_DEVICE_GENERIC reference counts at 0.
> > > 
> > > MEMORY_DEVICE_{PRIVATE,COHERENT} still expect that their ZONE_DEVICE
> > > pages start life at _refcount 1, so make that the default if
> > > pgmap->init_mode is left at zero.
> > 
> > I'm shocked to read this - how does it make any sense?
> 
> I think what happened is that since memremap_pages() historically
> produced pages with an elevated reference count that GPU drivers skipped
> taking a reference on first allocation and just passed along an elevated
> reference count page to the first user.
> 
> So either we keep that assumption or update all users to be prepared for
> idle pages coming out of memremap_pages().
> 
> This is all in reaction to the "set_page_count(page, 1);" in
> free_zone_device_page(). Which I am happy to get rid of but need from
> help from MEMORY_DEVICE_{PRIVATE,COHERENT} folks to react to
> memremap_pages() starting all pages at reference count 0.

But, but this is all racy, it can't do this:

+	if (pgmap->ops && pgmap->ops->page_free)
+		pgmap->ops->page_free(page);
 
 	/*
+	 * Reset the page count to the @init_mode value to prepare for
+	 * handing out the page again.
 	 */
+	if (pgmap->init_mode == INIT_PAGEMAP_BUSY)
+		set_page_count(page, 1);

after the fact! Something like that hmm_test has already threaded the
"freed" page into the free list via ops->page_free(), it can't have a
0 ref count and be on the free list, even temporarily :(

Maybe it nees to be re-ordered?

> > How on earth can a free'd page have both a 0 and 1 refcount??
> 
> This is residual wonkiness from memremap_pages() handing out pages with
> elevated reference counts at the outset.

I think the answer to my question is the above troubled code where we
still set the page refcount back to 1 even in the page_free path, so
there is some consistency "a freed paged may have a refcount of 1" for
the driver.

So, I guess this patch makes sense but I would put more noise around
INIT_PAGEMAP_BUSY (eg annotate every driver that is using it with the
explicit constant) and alert people that they need to fix their stuff
to get rid of it.

We should definately try to fix hmm_test as well so people have a good
reference code to follow in fixing the other drivers :(

Jason