lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 22 Sep 2022 11:36:13 +1000
From:   Alistair Popple <apopple@...dia.com>
To:     Dan Williams <dan.j.williams@...el.com>
Cc:     Jason Gunthorpe <jgg@...dia.com>, akpm@...ux-foundation.org,
        Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>,
        "Darrick J. Wong" <djwong@...nel.org>,
        Christoph Hellwig <hch@....de>,
        John Hubbard <jhubbard@...dia.com>,
        linux-fsdevel@...r.kernel.org, nvdimm@...ts.linux.dev,
        linux-xfs@...r.kernel.org, linux-mm@...ck.org,
        linux-ext4@...r.kernel.org
Subject: Re: [PATCH v2 16/18] mm/memremap_pages: Support initializing pages
 to a zero reference count


Dan Williams <dan.j.williams@...el.com> writes:

> Jason Gunthorpe wrote:
>> On Wed, Sep 21, 2022 at 04:45:22PM -0700, Dan Williams wrote:
>> > Jason Gunthorpe wrote:
>> > > On Thu, Sep 15, 2022 at 08:36:43PM -0700, Dan Williams wrote:
>> > > > The initial memremap_pages() implementation inherited the
>> > > > __init_single_page() default of pages starting life with an elevated
>> > > > reference count. This originally allowed for the page->pgmap pointer to
>> > > > alias with the storage for page->lru since a page was only allowed to be
>> > > > on an lru list when its reference count was zero.
>> > > >
>> > > > Since then, 'struct page' definition cleanups have arranged for
>> > > > dedicated space for the ZONE_DEVICE page metadata, and the
>> > > > MEMORY_DEVICE_{PRIVATE,COHERENT} work has arranged for the 1 -> 0
>> > > > page->_refcount transition to route the page to free_zone_device_page()
>> > > > and not the core-mm page-free. With those cleanups in place and with
>> > > > filesystem-dax and device-dax now converted to take and drop references
>> > > > at map and truncate time, it is possible to start MEMORY_DEVICE_FS_DAX
>> > > > and MEMORY_DEVICE_GENERIC reference counts at 0.
>> > > >
>> > > > MEMORY_DEVICE_{PRIVATE,COHERENT} still expect that their ZONE_DEVICE
>> > > > pages start life at _refcount 1, so make that the default if
>> > > > pgmap->init_mode is left at zero.
>> > >
>> > > I'm shocked to read this - how does it make any sense?
>> >
>> > I think what happened is that since memremap_pages() historically
>> > produced pages with an elevated reference count that GPU drivers skipped
>> > taking a reference on first allocation and just passed along an elevated
>> > reference count page to the first user.
>> >
>> > So either we keep that assumption or update all users to be prepared for
>> > idle pages coming out of memremap_pages().
>> >
>> > This is all in reaction to the "set_page_count(page, 1);" in
>> > free_zone_device_page(). Which I am happy to get rid of but need from
>> > help from MEMORY_DEVICE_{PRIVATE,COHERENT} folks to react to
>> > memremap_pages() starting all pages at reference count 0.
>>
>> But, but this is all racy, it can't do this:
>>
>> +	if (pgmap->ops && pgmap->ops->page_free)
>> +		pgmap->ops->page_free(page);
>>
>>  	/*
>> +	 * Reset the page count to the @init_mode value to prepare for
>> +	 * handing out the page again.
>>  	 */
>> +	if (pgmap->init_mode == INIT_PAGEMAP_BUSY)
>> +		set_page_count(page, 1);
>>
>> after the fact! Something like that hmm_test has already threaded the
>> "freed" page into the free list via ops->page_free(), it can't have a
>> 0 ref count and be on the free list, even temporarily :(
>>
>> Maybe it nees to be re-ordered?
>>
>> > > How on earth can a free'd page have both a 0 and 1 refcount??
>> >
>> > This is residual wonkiness from memremap_pages() handing out pages with
>> > elevated reference counts at the outset.
>>
>> I think the answer to my question is the above troubled code where we
>> still set the page refcount back to 1 even in the page_free path, so
>> there is some consistency "a freed paged may have a refcount of 1" for
>> the driver.
>>
>> So, I guess this patch makes sense but I would put more noise around
>> INIT_PAGEMAP_BUSY (eg annotate every driver that is using it with the
>> explicit constant) and alert people that they need to fix their stuff
>> to get rid of it.
>
> Sounds reasonable.
>
>> We should definately try to fix hmm_test as well so people have a good
>> reference code to follow in fixing the other drivers :(
>
> Oh, that's a good idea. I can probably fix that up and leave it to the
> GPU driver folks to catch up with that example so we can kill off
> INIT_PAGEMAP_BUSY.

I'm hoping to send my series that fixes up all drivers using device
coherent/private later this week or early next. So you could also just
wait for that and remove INIT_PAGEMAP_BUSY entirely.

 - Alistair

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ