[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0b6d36b9-ce12-4278-a697-0186088e17b1@redhat.com>
Date: Tue, 2 Jul 2024 15:01:04 +0200
From: David Hildenbrand <david@...hat.com>
To: Alistair Popple <apopple@...dia.com>
Cc: dan.j.williams@...el.com, vishal.l.verma@...el.com, dave.jiang@...el.com,
logang@...tatee.com, bhelgaas@...gle.com, jack@...e.cz, jgg@...pe.ca,
catalin.marinas@....com, will@...nel.org, mpe@...erman.id.au,
npiggin@...il.com, dave.hansen@...ux.intel.com, ira.weiny@...el.com,
willy@...radead.org, djwong@...nel.org, tytso@....edu, linmiaohe@...wei.com,
peterx@...hat.com, linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linuxppc-dev@...ts.ozlabs.org,
nvdimm@...ts.linux.dev, linux-cxl@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org, jhubbard@...dia.com,
hch@....de, david@...morbit.com
Subject: Re: [PATCH 07/13] huge_memory: Allow mappings of PUD sized pages
On 02.07.24 13:30, Alistair Popple wrote:
>
> David Hildenbrand <david@...hat.com> writes:
>
>> On 02.07.24 12:19, Alistair Popple wrote:
>>> David Hildenbrand <david@...hat.com> writes:
>>>
>>>> On 27.06.24 02:54, Alistair Popple wrote:
>>>>> Currently DAX folio/page reference counts are managed differently to
>>>>> normal pages. To allow these to be managed the same as normal pages
>>>>> introduce dax_insert_pfn_pud. This will map the entire PUD-sized folio
>>>>> and take references as it would for a normally mapped page.
>>>>> This is distinct from the current mechanism, vmf_insert_pfn_pud,
>>>>> which
>>>>> simply inserts a special devmap PUD entry into the page table without
>>>>> holding a reference to the page for the mapping.
>>>>
>>>> Do we really have to involve mapcounts/rmap for daxfs pages at this
>>>> point? Or is this only "to make it look more like other pages" ?
>>> The aim of the series is make FS DAX and other ZONE_DEVICE pages
>>> look
>>> like other pages, at least with regards to the way they are refcounted.
>>> At the moment they are not refcounted - instead their refcounts are
>>> basically statically initialised to one and there are all these special
>>> cases and functions requiring magic PTE bits (pXX_devmap) to do the
>>> special DAX reference counting. This then adds some cruft to manage
>>> pgmap references and to catch the 2->1 page refcount transition. All
>>> this just goes away if we manage the page references the same as other
>>> pages (and indeed we already manage DEVICE_PRIVATE and COHERENT pages
>>> the same as normal pages).
>>> So I think to make this work we at least need the mapcounts.
>>>
>>
>> We only really need the mapcounts if we intend to do something like
>> folio_mapcount() == folio_ref_count() to detect unexpected folio
>> references, and if we have to have things like folio_mapped()
>> working. For now that was not required, that's why I am asking.
>
> Oh I see, thanks for pointing that out. In that case I agree, we don't
> need the mapcounts. As you say we don't currently need to detect
> unexpect references for FS DAX and this series doesn't seek to introduce
> any new behviour/features.
>
>> Background also being that in a distant future folios will be
>> decoupled more from other compound pages, and only folios (or "struct
>> anon_folio" / "struct file_folio") would even have mapcounts.
>>
>> For example, most stuff we map (and refcount!) via vm_insert_page()
>> really must stop involving mapcounts. These won't be "ordinary"
>> mapcount-tracked folios in the future, they are simply some refcounted
>> pages some ordinary driver allocated.
>
> Ok, so for FS DAX we should take a reference on the folio for the
> mapping but not a mapcount?
That's what we could do, yes. But if we really just want to track page
table mappings like we do with other folios (anon/pagecache/shmem), and
want to keep doing that in the future, then maybe we should just do it
right away. The rmap code you add will be required in the future either way.
Using that code for device dax would likely be impossible right now due
to the vmemmap optimizations I guess, if we would end up writing subpage
mapcounts. But for FSDAX purposes (no vmemmap optimization) it would work.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists