lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZrZOqbS3bcj52JZP@x1n>
Date: Fri, 9 Aug 2024 13:15:21 -0400
From: Peter Xu <peterx@...hat.com>
To: David Hildenbrand <david@...hat.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	Sean Christopherson <seanjc@...gle.com>,
	Oscar Salvador <osalvador@...e.de>,
	Jason Gunthorpe <jgg@...dia.com>,
	Axel Rasmussen <axelrasmussen@...gle.com>,
	linux-arm-kernel@...ts.infradead.org, x86@...nel.org,
	Will Deacon <will@...nel.org>, Gavin Shan <gshan@...hat.com>,
	Paolo Bonzini <pbonzini@...hat.com>, Zi Yan <ziy@...dia.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Catalin Marinas <catalin.marinas@....com>,
	Ingo Molnar <mingo@...hat.com>,
	Alistair Popple <apopple@...dia.com>,
	Borislav Petkov <bp@...en8.de>,
	Thomas Gleixner <tglx@...utronix.de>, kvm@...r.kernel.org,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	Alex Williamson <alex.williamson@...hat.com>,
	Yan Zhao <yan.y.zhao@...el.com>
Subject: Re: [PATCH 07/19] mm/fork: Accept huge pfnmap entries

On Fri, Aug 09, 2024 at 06:32:44PM +0200, David Hildenbrand wrote:
> On 09.08.24 18:08, Peter Xu wrote:
> > Teach the fork code to properly copy pfnmaps for pmd/pud levels.  Pud is
> > much easier, the write bit needs to be persisted though for writable and
> > shared pud mappings like PFNMAP ones, otherwise a follow up write in either
> > parent or child process will trigger a write fault.
> > 
> > Do the same for pmd level.
> > 
> > Signed-off-by: Peter Xu <peterx@...hat.com>
> > ---
> >   mm/huge_memory.c | 27 ++++++++++++++++++++++++---
> >   1 file changed, 24 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 6568586b21ab..015c9468eed5 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1375,6 +1375,22 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> >   	pgtable_t pgtable = NULL;
> >   	int ret = -ENOMEM;
> > +	pmd = pmdp_get_lockless(src_pmd);
> > +	if (unlikely(pmd_special(pmd))) {
> > +		dst_ptl = pmd_lock(dst_mm, dst_pmd);
> > +		src_ptl = pmd_lockptr(src_mm, src_pmd);
> > +		spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
> > +		/*
> > +		 * No need to recheck the pmd, it can't change with write
> > +		 * mmap lock held here.
> > +		 */
> > +		if (is_cow_mapping(src_vma->vm_flags) && pmd_write(pmd)) {
> > +			pmdp_set_wrprotect(src_mm, addr, src_pmd);
> > +			pmd = pmd_wrprotect(pmd);
> > +		}
> > +		goto set_pmd;
> > +	}
> > +
> 
> I strongly assume we should be using using vm_normal_page_pmd() instead of
> pmd_page() further below. pmd_special() should be mostly limited to GUP-fast
> and vm_normal_page_pmd().

One thing to mention that it has this:

	if (!vma_is_anonymous(dst_vma))
		return 0;

So it's only about anonymous below that.  In that case I feel like the
pmd_page() is benign, and actually good.

Though what you're saying here made me notice my above check doesn't seem
to be necessary, I mean, "(is_cow_mapping(src_vma->vm_flags) &&
pmd_write(pmd))" can't be true when special bit is set, aka, pfnmaps.. and
if it's writable for CoW it means it's already an anon.

I think I can probably drop that line there, perhaps with a
VM_WARN_ON_ONCE() making sure it won't happen.

> 
> Again, we should be doing this similar to how we handle PTEs.
> 
> I'm a bit confused about the "unlikely(!pmd_trans_huge(pmd)" check, below:
> what else should we have here if it's not a migration entry but a present
> entry?

I had a feeling that it was just a safety belt since the 1st day of thp
when Andrea worked that out, so that it'll work with e.g. file truncation
races.

But with current code it looks like it's only anonymous indeed, so looks
not possible at least from that pov.

Thanks,

> 
> Likely this function needs a bit of rework.
> 
> -- 
> Cheers,
> 
> David / dhildenb
> 

-- 
Peter Xu


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ