[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ubf5hakohi4hhmoqdxk255rwarwv3vwt7j3l5aqtznorfcxx6r@37jhsud4gjtw>
Date: Wed, 15 Jan 2025 17:38:06 +1100
From: Alistair Popple <apopple@...dia.com>
To: David Hildenbrand <david@...hat.com>
Cc: akpm@...ux-foundation.org, dan.j.williams@...el.com,
linux-mm@...ck.org, alison.schofield@...el.com, lina@...hilina.net,
zhang.lyra@...il.com, gerald.schaefer@...ux.ibm.com, vishal.l.verma@...el.com,
dave.jiang@...el.com, logang@...tatee.com, bhelgaas@...gle.com, jack@...e.cz,
jgg@...pe.ca, catalin.marinas@....com, will@...nel.org, mpe@...erman.id.au,
npiggin@...il.com, dave.hansen@...ux.intel.com, ira.weiny@...el.com,
willy@...radead.org, djwong@...nel.org, tytso@....edu, linmiaohe@...wei.com,
peterx@...hat.com, linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linuxppc-dev@...ts.ozlabs.org, nvdimm@...ts.linux.dev,
linux-cxl@...r.kernel.org, linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org,
linux-xfs@...r.kernel.org, jhubbard@...dia.com, hch@....de, david@...morbit.com,
chenhuacai@...nel.org, kernel@...0n.name, loongarch@...ts.linux.dev
Subject: Re: [PATCH v6 15/26] huge_memory: Add vmf_insert_folio_pud()
On Tue, Jan 14, 2025 at 05:22:15PM +0100, David Hildenbrand wrote:
> On 10.01.25 07:00, Alistair Popple wrote:
> > Currently DAX folio/page reference counts are managed differently to
> > normal pages. To allow these to be managed the same as normal pages
> > introduce vmf_insert_folio_pud. This will map the entire PUD-sized folio
> > and take references as it would for a normally mapped page.
> >
> > This is distinct from the current mechanism, vmf_insert_pfn_pud, which
> > simply inserts a special devmap PUD entry into the page table without
> > holding a reference to the page for the mapping.
> >
> > Signed-off-by: Alistair Popple <apopple@...dia.com>
>
> [...]
>
> > +/**
> > + * vmf_insert_folio_pud - insert a pud size folio mapped by a pud entry
> > + * @vmf: Structure describing the fault
> > + * @folio: folio to insert
> > + * @write: whether it's a write fault
> > + *
> > + * Return: vm_fault_t value.
> > + */
> > +vm_fault_t vmf_insert_folio_pud(struct vm_fault *vmf, struct folio *folio, bool write)
> > +{
> > + struct vm_area_struct *vma = vmf->vma;
> > + unsigned long addr = vmf->address & PUD_MASK;
> > + pud_t *pud = vmf->pud;
> > + struct mm_struct *mm = vma->vm_mm;
> > + spinlock_t *ptl;
> > +
> > + if (addr < vma->vm_start || addr >= vma->vm_end)
> > + return VM_FAULT_SIGBUS;
> > +
> > + if (WARN_ON_ONCE(folio_order(folio) != PUD_ORDER))
> > + return VM_FAULT_SIGBUS;
> > +
> > + ptl = pud_lock(mm, pud);
> > + if (pud_none(*vmf->pud)) {
> > + folio_get(folio);
> > + folio_add_file_rmap_pud(folio, &folio->page, vma);
> > + add_mm_counter(mm, mm_counter_file(folio), HPAGE_PUD_NR);
> > + }
> > + insert_pfn_pud(vma, addr, vmf->pud, pfn_to_pfn_t(folio_pfn(folio)), write);
>
> This looks scary at first (inserting something when not taking a reference),
> but insert_pfn_pud() seems to handle that. A comment here would have been
> nice.
Indeed, I will add one.
> It's weird, though, that if there is already something else, that we only
> WARN but don't actually return an error. So ...
Note we only WARN when there is already a mapping there and we're trying to
upgrade it to writeable. This just mimics the logic which currently exists in
insert_pfn() and insert_pfn_pmd().
The comment in insert_pfn() sheds more light:
/*
* For read faults on private mappings the PFN passed
* in may not match the PFN we have mapped if the
* mapped PFN is a writeable COW page. In the mkwrite
* case we are creating a writable PTE for a shared
* mapping and we expect the PFNs to match. If they
* don't match, we are likely racing with block
* allocation and mapping invalidation so just skip the
* update.
*/
> > + spin_unlock(ptl);
> > +
> > + return VM_FAULT_NOPAGE;
>
> I assume always returning VM_FAULT_NOPAGE, even when something went wrong,
> is the right thing to do?
Yes, I think so. I guess in the WARN case we could return something like
VM_FAULT_SIGBUS to kill the application, but the existing vmf_insert_*()
functions don't currently do that so I think that would be a separate clean-up.
> Apart from that LGTM.
>
>
> --
> Cheers,
>
> David / dhildenb
>
Powered by blists - more mailing lists