[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <cover.f15b25597fc3afd45b144df863eeca3b2c13f9f4.1664171943.git-series.apopple@nvidia.com>
Date: Mon, 26 Sep 2022 16:03:04 +1000
From: Alistair Popple <apopple@...dia.com>
To: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>
Cc: Michael Ellerman <mpe@...erman.id.au>,
Nicholas Piggin <npiggin@...il.com>,
Felix Kuehling <Felix.Kuehling@....com>,
Alex Deucher <alexander.deucher@....com>,
Christian König <christian.koenig@....com>,
"Pan, Xinhui" <Xinhui.Pan@....com>,
David Airlie <airlied@...ux.ie>,
Daniel Vetter <daniel@...ll.ch>,
Ben Skeggs <bskeggs@...hat.com>,
Karol Herbst <kherbst@...hat.com>,
Lyude Paul <lyude@...hat.com>,
Ralph Campbell <rcampbell@...dia.com>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Alex Sierra <alex.sierra@....com>,
John Hubbard <jhubbard@...dia.com>,
linuxppc-dev@...ts.ozlabs.org, linux-kernel@...r.kernel.org,
amd-gfx@...ts.freedesktop.org, nouveau@...ts.freedesktop.org,
dri-devel@...ts.freedesktop.org, Jason Gunthorpe <jgg@...dia.com>,
Dan Williams <dan.j.williams@...el.com>,
Alistair Popple <apopple@...dia.com>
Subject: [PATCH 0/7] Fix several device private page reference counting issues
This series aims to fix a number of page reference counting issues in drivers
dealing with device private ZONE_DEVICE pages. These result in use-after-free
type bugs, either from accessing a struct page which no longer exists because it
has been removed or accessing fields within the struct page which are no longer
valid because the page has been freed.
During normal usage it is unlikely these will cause any problems. However
without these fixes it is possible to crash the kernel from userspace. These
crashes can be triggered either by unloading the kernel module or unbinding the
device from the driver prior to a userspace task exiting. In modules such as
Nouveau it is also possible to trigger some of these issues by explicitly
closing the device file-descriptor prior to the task exiting and then accessing
device private memory.
This involves changes to both PowerPC and AMD GPU code. Unfortunately I lack the
hardware to test on either of these so would appreciate it if someone with
access could test those.
Alistair Popple (7):
mm/memory.c: Fix race when faulting a device private page
mm: Free device private pages have zero refcount
mm/migrate_device.c: Refactor migrate_vma and migrate_deivce_coherent_page()
mm/migrate_device.c: Add migrate_device_range()
nouveau/dmem: Refactor nouveau_dmem_fault_copy_one()
nouveau/dmem: Evict device private memory during release
hmm-tests: Add test for migrate_device_range()
arch/powerpc/kvm/book3s_hv_uvmem.c | 16 +-
drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 18 +-
drivers/gpu/drm/amd/amdkfd/kfd_migrate.h | 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +-
drivers/gpu/drm/nouveau/nouveau_dmem.c | 108 +++++++----
include/linux/migrate.h | 15 ++-
lib/test_hmm.c | 127 ++++++++++---
lib/test_hmm_uapi.h | 1 +-
mm/memory.c | 16 +-
mm/memremap.c | 5 +-
mm/migrate.c | 34 +--
mm/migrate_device.c | 239 +++++++++++++++++-------
mm/page_alloc.c | 6 +-
tools/testing/selftests/vm/hmm-tests.c | 49 +++++-
14 files changed, 487 insertions(+), 160 deletions(-)
base-commit: 088b8aa537c2c767765f1c19b555f21ffe555786
--
git-series 0.9.1
Powered by blists - more mailing lists