[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20190620142918.GE9832@kroah.com>
Date: Thu, 20 Jun 2019 16:29:18 +0200
From: Greg KH <gregkh@...uxfoundation.org>
To: Michal Hocko <mhocko@...nel.org>
Cc: Stable tree <stable@...r.kernel.org>,
Jason Gunthorpe <jgg@...lanox.com>, linux-mm@...ck.org,
LKML <linux-kernel@...r.kernel.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Jann Horn <jannh@...gle.com>, Oleg Nesterov <oleg@...hat.com>,
Peter Xu <peterx@...hat.com>,
Mike Rapoport <rppt@...ux.ibm.com>,
Michal Hocko <mhocko@...e.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Joel Fernandes <joel@...lfernandes.org>
Subject: Re: [PATCH stable-4.4 v3] coredump: fix race condition between
mmget_not_zero()/get_task_mm() and core dumping
On Mon, Jun 17, 2019 at 08:58:24AM +0200, Michal Hocko wrote:
> From: Andrea Arcangeli <aarcange@...hat.com>
>
> Upstream 04f5866e41fb70690e28397487d8bd8eea7d712a commit.
>
> The core dumping code has always run without holding the mmap_sem for
> writing, despite that is the only way to ensure that the entire vma
> layout will not change from under it. Only using some signal
> serialization on the processes belonging to the mm is not nearly enough.
> This was pointed out earlier. For example in Hugh's post from Jul 2017:
>
> https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils
>
> "Not strictly relevant here, but a related note: I was very surprised
> to discover, only quite recently, how handle_mm_fault() may be called
> without down_read(mmap_sem) - when core dumping. That seems a
> misguided optimization to me, which would also be nice to correct"
>
> In particular because the growsdown and growsup can move the
> vm_start/vm_end the various loops the core dump does around the vma will
> not be consistent if page faults can happen concurrently.
>
> Pretty much all users calling mmget_not_zero()/get_task_mm() and then
> taking the mmap_sem had the potential to introduce unexpected side
> effects in the core dumping code.
>
> Adding mmap_sem for writing around the ->core_dump invocation is a
> viable long term fix, but it requires removing all copy user and page
> faults and to replace them with get_dump_page() for all binary formats
> which is not suitable as a short term fix.
>
> For the time being this solution manually covers the places that can
> confuse the core dump either by altering the vma layout or the vma flags
> while it runs. Once ->core_dump runs under mmap_sem for writing the
> function mmget_still_valid() can be dropped.
>
> Allowing mmap_sem protected sections to run in parallel with the
> coredump provides some minor parallelism advantage to the swapoff code
> (which seems to be safe enough by never mangling any vma field and can
> keep doing swapins in parallel to the core dumping) and to some other
> corner case.
>
> In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
> however the side effect of this same race condition in /proc/pid/mem
> should be reproducible since before 2.6.12-rc2 so I couldn't add any
> other "Fixes:" because there's no hash beyond the git genesis commit.
>
> Because find_extend_vma() is the only location outside of the process
> context that could modify the "mm" structures under mmap_sem for
> reading, by adding the mmget_still_valid() check to it, all other cases
> that take the mmap_sem for reading don't need the new check after
> mmget_not_zero()/get_task_mm(). The expand_stack() in page fault
> context also doesn't need the new check, because all tasks under core
> dumping are frozen.
>
> Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
> Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
> Signed-off-by: Andrea Arcangeli <aarcange@...hat.com>
> Reported-by: Jann Horn <jannh@...gle.com>
> Suggested-by: Oleg Nesterov <oleg@...hat.com>
> Acked-by: Peter Xu <peterx@...hat.com>
> Reviewed-by: Mike Rapoport <rppt@...ux.ibm.com>
> Reviewed-by: Oleg Nesterov <oleg@...hat.com>
> Reviewed-by: Jann Horn <jannh@...gle.com>
> Acked-by: Jason Gunthorpe <jgg@...lanox.com>
> Acked-by: Michal Hocko <mhocko@...e.com>
> Cc: <stable@...r.kernel.org>
> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
> Cc: Joel Fernandes (Google) <joel@...lfernandes.org>
> [mhocko@...e.com: stable 4.4 backport
> - drop infiniband part because of missing 5f9794dc94f59
> - drop userfaultfd_event_wait_completion hunk because of
> missing 9cd75c3cd4c3d]
> - handle binder_update_page_range because of missing 720c241924046
> - handle mlx5_ib_disassociate_ucontext - akaher@...are.com
> ]
> Signed-off-by: Michal Hocko <mhocko@...e.com>
> ---
> drivers/android/binder.c | 6 ++++++
> drivers/infiniband/hw/mlx4/main.c | 3 +++
> fs/proc/task_mmu.c | 18 ++++++++++++++++++
> fs/userfaultfd.c | 10 ++++++++--
> include/linux/mm.h | 21 +++++++++++++++++++++
> mm/mmap.c | 7 ++++++-
> 6 files changed, 62 insertions(+), 3 deletions(-)
I've queued this up now, as it looks like everyone agrees with it. What
about a 4.9.y backport?
thanks,
greg k-h
Powered by blists - more mailing lists