lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ox3rg6uyazlaeshxeub5hxv4z4bjai222mkitoduktmar5l3pd@qfxv4jdnj5xo>
Date: Mon, 22 Jul 2024 16:22:45 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Peter Xu <peterx@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Alex Williamson <alex.williamson@...hat.com>,
        Jason Gunthorpe <jgg@...dia.com>, Al Viro <viro@...iv.linux.org.uk>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        "Kirill A . Shutemov" <kirill@...temov.name>, x86@...nel.org,
        Yan Zhao <yan.y.zhao@...el.com>, Kevin Tian <kevin.tian@...el.com>,
        Pei Li <peili.dev@...il.com>, David Hildenbrand <david@...hat.com>,
        David Wang <00107082@....com>, Bert Karwatzki <spasswolf@....de>,
        Sergey Senozhatsky <senozhatsky@...omium.org>
Subject: Re: [PATCH] mm/x86/pat: Only untrack the pfn range if unmap region

* Peter Xu <peterx@...hat.com> [240722 11:15]:
> On Fri, Jul 19, 2024 at 10:18:12PM -0400, Liam R. Howlett wrote:
> > * Peter Xu <peterx@...hat.com> [240712 10:43]:
> > > This patch is one patch of an old series [1] that got reposted standalone
> > > here, with the hope to fix some reported untrack_pfn() issues reported
> > > recently [2,3], where there used to be other fix [4] but unfortunately
> > > which looks like to cause other issues.  The hope is this patch can fix it
> > > the right way.
> > > 
> > > X86 uses pfn tracking to do pfnmaps.  AFAICT, the tracking should normally
> > > start at mmap() of device drivers, then untracked when munmap().  However
> > > in the current code the untrack is done in unmap_single_vma().  This might
> > > be problematic.
> > > 
> > > For example, unmap_single_vma() can be used nowadays even for zapping a
> > > single page rather than the whole vmas.  It's very confusing to do whole
> > > vma untracking in this function even if a caller would like to zap one
> > > page.  It could simply be wrong.
> > > 
> > > Such issue won't be exposed by things like MADV_DONTNEED won't ever work
> > > for pfnmaps and it'll fail the madvise() already before reaching here.
> > > However looks like it can be triggered like what was reported where invoked
> > > from an unmap request from a file vma.
> > > 
> > > There's also work [5] on VFIO (merged now) to allow tearing down MMIO
> > > pgtables before an munmap(), in which case we may not want to untrack the
> > > pfns if we're only tearing down the pgtables.  IOW, we may want to keep the
> > > pfn tracking information as those pfn mappings can be restored later with
> > > the same vma object.  Currently it's not an immediate problem for VFIO, as
> > > VFIO uses UC- by default, but it looks like there's plan to extend that in
> > > the near future.
> > > 
> > > IIUC, this was overlooked when zap_page_range_single() was introduced,
> > > while in the past it was only used in the munmap() path which wants to
> > > always unmap the region completely.  E.g., commit f5cc4eef9987 ("VM: make
> > > zap_page_range() callers that act on a single VMA use separate helper") is
> > > the initial commit that introduced unmap_single_vma(), in which the chunk
> > > of untrack_pfn() was moved over from unmap_vmas().
> > > 
> > > Recover that behavior to untrack pfnmap only when unmap regions.
> > > 
> > > [1] https://lore.kernel.org/r/20240523223745.395337-1-peterx@redhat.com
> > > [2] https://groups.google.com/g/syzkaller-bugs/c/FeQZvSbqWbQ/m/tHFmoZthAAAJ
> > > [3] https://lore.kernel.org/r/20240712131931.20207-1-00107082@163.com
> > > [4] https://lore.kernel.org/all/20240710-bug12-v1-1-0e5440f9b8d3@gmail.com/
> > > [5] https://lore.kernel.org/r/20240523195629.218043-1-alex.williamson@redhat.com
> > > 
> > > Cc: Alex Williamson <alex.williamson@...hat.com>
> > > Cc: Jason Gunthorpe <jgg@...dia.com>
> > > Cc: Al Viro <viro@...iv.linux.org.uk>
> > > Cc: Dave Hansen <dave.hansen@...ux.intel.com>
> > > Cc: Andy Lutomirski <luto@...nel.org>
> > > Cc: Peter Zijlstra <peterz@...radead.org>
> > > Cc: Thomas Gleixner <tglx@...utronix.de>
> > > Cc: Ingo Molnar <mingo@...hat.com>
> > > Cc: Borislav Petkov <bp@...en8.de>
> > > Cc: Kirill A. Shutemov <kirill@...temov.name>
> > > Cc: x86@...nel.org
> > > Cc: Yan Zhao <yan.y.zhao@...el.com>
> > > Cc: Kevin Tian <kevin.tian@...el.com>
> > > Cc: Pei Li <peili.dev@...il.com>
> > > Cc: David Hildenbrand <david@...hat.com>
> > > Cc: David Wang <00107082@....com>
> > > Cc: Bert Karwatzki <spasswolf@....de>
> > > Cc: Sergey Senozhatsky <senozhatsky@...omium.org>
> > > Signed-off-by: Peter Xu <peterx@...hat.com>
> > > ---
> > > 
> > > NOTE: I massaged the commit message comparing to the rfc post [1], the
> > > patch itself is untouched.  Also removed rfc tag, and added more people
> > > into the loop. Please kindly help test this patch if you have a reproducer,
> > > as I can't reproduce it myself even with the syzbot reproducer on top of
> > > mm-unstable.  Instead of further check on the reproducer, I decided to send
> > > this out first as we have a bunch of reproducers on the list now..
> > > ---
> > >  mm/memory.c | 5 ++---
> > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 4bcd79619574..f57cc304b318 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -1827,9 +1827,6 @@ static void unmap_single_vma(struct mmu_gather *tlb,
> > >  	if (vma->vm_file)
> > >  		uprobe_munmap(vma, start, end);
> > >  
> > > -	if (unlikely(vma->vm_flags & VM_PFNMAP))
> > > -		untrack_pfn(vma, 0, 0, mm_wr_locked);
> > > -
> > >  	if (start != end) {
> > >  		if (unlikely(is_vm_hugetlb_page(vma))) {
> > >  			/*
> > > @@ -1894,6 +1891,8 @@ void unmap_vmas(struct mmu_gather *tlb, struct ma_state *mas,
> > >  		unsigned long start = start_addr;
> > >  		unsigned long end = end_addr;
> > >  		hugetlb_zap_begin(vma, &start, &end);
> > > +		if (unlikely(vma->vm_flags & VM_PFNMAP))
> > > +			untrack_pfn(vma, 0, 0, mm_wr_locked);
> > >  		unmap_single_vma(tlb, vma, start, end, &details,
> > >  				 mm_wr_locked);
> > >  		hugetlb_zap_end(vma, &details);
> > > -- 
> > > 2.45.0
> > 
> > 
> > ...Trying to follow this discussion across several threads and bug
> > reports.   I was looped in when syzbot found that the [4] fix was a
> > deadlock.
> > 
> > How are we reaching unmap_vmas() without the mmap lock held in any mode?
> > We must be holding the read or write lock - otherwise the vma pointer is
> > unsafe...?
> 
> The report was not calling unmap_vmas() but unmap_single_vma(), and this
> patch proposed to move the untrack operation there.  We should always hold
> write lock for unmap_vmas(), afaiu.

unmap_single_vma() also takes a vma pointer.  It is in both [2] and [3].

> 
> > 
> > In any case, since this will just keep calling unmap_single_vma() it has
> > to be an incomplete fix?
> 
> I think there's indeed some issue to settle besides this patch, however I
> didn't quickly get why this patch is incomplete from this specific "untrack
> pfn within unmap_single_vma()" problem.  I thought it was complete from
> that regard, or could you elaborate otherwise?
> 

The problem report from [2] and [3] is that we are getting to a call
path that includes unmap_single_vma() without the mmap lock.  This patch
fails to address that issue, it only takes the caller with the assert
out of the call path.

Removing the function with the lock check doesn't fix the locking issue.
If there is no locking issue here, please state the case in the commit
log as you feel it is safe to use a vma pointer without the mmap_lock
held.

Thanks,
Liam

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ