[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20190322111305.077729968@linuxfoundation.org>
Date:   Fri, 22 Mar 2019 12:15:30 +0100
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     linux-kernel@...r.kernel.org
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        stable@...r.kernel.org, Jan Stancek <jstancek@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Matthew Wilcox <willy@...radead.org>,
        Rafael Aquini <aquini@...hat.com>,
        Minchan Kim <minchan@...nel.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Rik van Riel <riel@...riel.com>,
        Michal Hocko <mhocko@...e.com>,
        Huang Ying <ying.huang@...el.com>,
        Souptick Joarder <jrdr.linux@...il.com>,
        Jerome Glisse <jglisse@...hat.com>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
        David Hildenbrand <david@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: [PATCH 5.0 111/238] mm/memory.c: do_fault: avoid usage of stale vm_area_struct
5.0-stable review patch.  If anyone has any objections, please let me know.
------------------
From: Jan Stancek <jstancek@...hat.com>
commit fc8efd2ddfed3f343c11b693e87140ff358d7ff5 upstream.
LTP testcase mtest06 [1] can trigger a crash on s390x running 5.0.0-rc8.
This is a stress test, where one thread mmaps/writes/munmaps memory area
and other thread is trying to read from it:
  CPU: 0 PID: 2611 Comm: mmap1 Not tainted 5.0.0-rc8+ #51
  Hardware name: IBM 2964 N63 400 (z/VM 6.4.0)
  Krnl PSW : 0404e00180000000 00000000001ac8d8 (__lock_acquire+0x7/0x7a8)
  Call Trace:
  ([<0000000000000000>]           (null))
   [<00000000001adae4>] lock_acquire+0xec/0x258
   [<000000000080d1ac>] _raw_spin_lock_bh+0x5c/0x98
   [<000000000012a780>] page_table_free+0x48/0x1a8
   [<00000000002f6e54>] do_fault+0xdc/0x670
   [<00000000002fadae>] __handle_mm_fault+0x416/0x5f0
   [<00000000002fb138>] handle_mm_fault+0x1b0/0x320
   [<00000000001248cc>] do_dat_exception+0x19c/0x2c8
   [<000000000080e5ee>] pgm_check_handler+0x19e/0x200
page_table_free() is called with NULL mm parameter, but because "0" is a
valid address on s390 (see S390_lowcore), it keeps going until it
eventually crashes in lockdep's lock_acquire.  This crash is
reproducible at least since 4.14.
Problem is that "vmf->vma" used in do_fault() can become stale.  Because
mmap_sem may be released, other threads can come in, call munmap() and
cause "vma" be returned to kmem cache, and get zeroed/re-initialized and
re-used:
handle_mm_fault                           |
  __handle_mm_fault                       |
    do_fault                              |
      vma = vmf->vma                      |
      do_read_fault                       |
        __do_fault                        |
          vma->vm_ops->fault(vmf);        |
            mmap_sem is released          |
                                          |
                                          | do_munmap()
                                          |   remove_vma_list()
                                          |     remove_vma()
                                          |       vm_area_free()
                                          |         # vma is released
                                          | ...
                                          | # same vma is allocated
                                          | # from kmem cache
                                          | do_mmap()
                                          |   vm_area_alloc()
                                          |     memset(vma, 0, ...)
                                          |
      pte_free(vma->vm_mm, ...);          |
        page_table_free                   |
          spin_lock_bh(&mm->context.lock);|
            <crash>                       |
Cache mm_struct to avoid using potentially stale "vma".
[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mtest06/mmap1.c
Link: http://lkml.kernel.org/r/5b3fdf19e2a5be460a384b936f5b56e13733f1b8.1551595137.git.jstancek@redhat.com
Signed-off-by: Jan Stancek <jstancek@...hat.com>
Reviewed-by: Andrea Arcangeli <aarcange@...hat.com>
Reviewed-by: Matthew Wilcox <willy@...radead.org>
Acked-by: Rafael Aquini <aquini@...hat.com>
Reviewed-by: Minchan Kim <minchan@...nel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
Cc: Rik van Riel <riel@...riel.com>
Cc: Michal Hocko <mhocko@...e.com>
Cc: Huang Ying <ying.huang@...el.com>
Cc: Souptick Joarder <jrdr.linux@...il.com>
Cc: Jerome Glisse <jglisse@...hat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@...ux.ibm.com>
Cc: David Hildenbrand <david@...hat.com>
Cc: Andrea Arcangeli <aarcange@...hat.com>
Cc: David Rientjes <rientjes@...gle.com>
Cc: Mel Gorman <mgorman@...hsingularity.net>
Cc: <stable@...r.kernel.org>
Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
---
 mm/memory.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3517,10 +3517,13 @@ static vm_fault_t do_shared_fault(struct
  * but allow concurrent faults).
  * The mmap_sem may have been released depending on flags and our
  * return value.  See filemap_fault() and __lock_page_or_retry().
+ * If mmap_sem is released, vma may become invalid (for example
+ * by other thread calling munmap()).
  */
 static vm_fault_t do_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
+	struct mm_struct *vm_mm = vma->vm_mm;
 	vm_fault_t ret;
 
 	/*
@@ -3561,7 +3564,7 @@ static vm_fault_t do_fault(struct vm_fau
 
 	/* preallocated pagetable is unused: free it */
 	if (vmf->prealloc_pte) {
-		pte_free(vma->vm_mm, vmf->prealloc_pte);
+		pte_free(vm_mm, vmf->prealloc_pte);
 		vmf->prealloc_pte = NULL;
 	}
 	return ret;
Powered by blists - more mailing lists
 
