[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20221116102659.70287-1-david@redhat.com>
Date: Wed, 16 Nov 2022 11:26:39 +0100
From: David Hildenbrand <david@...hat.com>
To: linux-kernel@...r.kernel.org
Cc: x86@...nel.org, linux-alpha@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-ia64@...r.kernel.org,
linux-mips@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
sparclinux@...r.kernel.org, linux-um@...ts.infradead.org,
etnaviv@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
linux-samsung-soc@...r.kernel.org, linux-rdma@...r.kernel.org,
linux-media@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-mm@...ck.org, linux-perf-users@...r.kernel.org,
linux-security-module@...r.kernel.org,
linux-kselftest@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Jason Gunthorpe <jgg@...pe.ca>,
John Hubbard <jhubbard@...dia.com>,
Peter Xu <peterx@...hat.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Hugh Dickins <hughd@...gle.com>, Nadav Amit <namit@...are.com>,
Vlastimil Babka <vbabka@...e.cz>,
Matthew Wilcox <willy@...radead.org>,
Mike Kravetz <mike.kravetz@...cle.com>,
Muchun Song <songmuchun@...edance.com>,
Shuah Khan <shuah@...nel.org>,
Lucas Stach <l.stach@...gutronix.de>,
David Airlie <airlied@...il.com>,
Oded Gabbay <ogabbay@...nel.org>,
Arnd Bergmann <arnd@...db.de>,
Christoph Hellwig <hch@...radead.org>,
Alex Williamson <alex.williamson@...hat.com>,
David Hildenbrand <david@...hat.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Alexander Viro <viro@...iv.linux.org.uk>,
Andy Walls <awalls@...metrocast.net>,
Anton Ivanov <anton.ivanov@...bridgegreys.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Bernard Metzler <bmt@...ich.ibm.com>,
Borislav Petkov <bp@...en8.de>,
Catalin Marinas <catalin.marinas@....com>,
Christian Benvenuti <benve@...co.com>,
Christian Gmeiner <christian.gmeiner@...il.com>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Daniel Vetter <daniel@...ll.ch>,
Daniel Vetter <daniel.vetter@...ll.ch>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"David S. Miller" <davem@...emloft.net>,
Dennis Dalessandro <dennis.dalessandro@...nelisnetworks.com>,
Eric Biederman <ebiederm@...ssion.com>,
Hans Verkuil <hverkuil@...all.nl>,
"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
Inki Dae <inki.dae@...sung.com>,
Ivan Kokshaysky <ink@...assic.park.msu.ru>,
James Morris <jmorris@...ei.org>, Jiri Olsa <jolsa@...nel.org>,
Johannes Berg <johannes@...solutions.net>,
Kees Cook <keescook@...omium.org>,
Kentaro Takeda <takedakn@...data.co.jp>,
Krzysztof Kozlowski <krzysztof.kozlowski@...aro.org>,
Kyungmin Park <kyungmin.park@...sung.com>,
Leon Romanovsky <leon@...nel.org>,
Leon Romanovsky <leonro@...dia.com>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Mark Rutland <mark.rutland@....com>,
Matt Turner <mattst88@...il.com>,
Mauro Carvalho Chehab <mchehab@...nel.org>,
Michael Ellerman <mpe@...erman.id.au>,
Namhyung Kim <namhyung@...nel.org>,
Nelson Escobar <neescoba@...co.com>,
Nicholas Piggin <npiggin@...il.com>,
Oleg Nesterov <oleg@...hat.com>,
Paul Moore <paul@...l-moore.com>,
Peter Zijlstra <peterz@...radead.org>,
Richard Henderson <richard.henderson@...aro.org>,
Richard Weinberger <richard@....at>,
Russell King <linux+etnaviv@...linux.org.uk>,
"Serge E. Hallyn" <serge@...lyn.com>,
Seung-Woo Kim <sw0312.kim@...sung.com>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
Thomas Gleixner <tglx@...utronix.de>,
Tomasz Figa <tfiga@...omium.org>, Will Deacon <will@...nel.org>
Subject: [PATCH mm-unstable v1 00/20] mm/gup: remove FOLL_FORCE usage from drivers (reliable R/O long-term pinning)
For now, we did not support reliable R/O long-term pinning in COW mappings.
That means, if we would trigger R/O long-term pinning in MAP_PRIVATE
mapping, we could end up pinning the (R/O-mapped) shared zeropage or a
pagecache page.
The next write access would trigger a write fault and replace the pinned
page by an exclusive anonymous page in the process page table; whatever the
process would write to that private page copy would not be visible by the
owner of the previous page pin: for example, RDMA could read stale data.
The end result is essentially an unexpected and hard-to-debug memory
corruption.
Some drivers tried working around that limitation by using
"FOLL_FORCE|FOLL_WRITE|FOLL_LONGTERM" for R/O long-term pinning for now.
FOLL_WRITE would trigger a write fault, if required, and break COW before
pinning the page. FOLL_FORCE is required because the VMA might lack write
permissions, and drivers wanted to make that working as well, just like
one would expect (no write access, but still triggering a write access to
break COW).
However, that is not a practical solution, because
(1) Drivers that don't stick to that undocumented and debatable pattern
would still run into that issue. For example, VFIO only uses
FOLL_LONGTERM for R/O long-term pinning.
(2) Using FOLL_WRITE just to work around a COW mapping + page pinning
limitation is unintuitive. FOLL_WRITE would, for example, mark the
page softdirty or trigger uffd-wp, even though, there actually isn't
going to be any write access.
(3) The purpose of FOLL_FORCE is debug access, not access without lack of
VMA permissions by arbitrarty drivers.
So instead, make R/O long-term pinning work as expected, by breaking COW
in a COW mapping early, such that we can remove any FOLL_FORCE usage from
drivers and make FOLL_FORCE ptrace-specific (renaming it to FOLL_PTRACE).
More details in patch #8.
Patches #1--#3 add COW tests for non-anonymous pages.
Patches #4--#7 prepare core MM for extended FAULT_FLAG_UNSHARE support in
COW mappings.
Patch #8 implements reliable R/O long-term pinning in COW mappings
Patches #9--#19 remove any FOLL_FORCE usage from drivers.
Patch #20 renames FOLL_FORCE to FOLL_PTRACE.
I'm refraining from CCing all driver/arch maintainers on the whole patch
set, but only CC them on the cover letter and the applicable patch
(I know, I know, someone is always unhappy ... sorry).
RFC -> v1:
* Use term "ptrace" instead of "debuggers" in patch descriptions
* Added ACK/Tested-by
* "mm/frame-vector: remove FOLL_FORCE usage"
-> Adjust description
* "mm: rename FOLL_FORCE to FOLL_PTRACE"
-> Added
David Hildenbrand (20):
selftests/vm: anon_cow: prepare for non-anonymous COW tests
selftests/vm: cow: basic COW tests for non-anonymous pages
selftests/vm: cow: R/O long-term pinning reliability tests for
non-anon pages
mm: add early FAULT_FLAG_UNSHARE consistency checks
mm: add early FAULT_FLAG_WRITE consistency checks
mm: rework handling in do_wp_page() based on private vs. shared
mappings
mm: don't call vm_ops->huge_fault() in wp_huge_pmd()/wp_huge_pud() for
private mappings
mm: extend FAULT_FLAG_UNSHARE support to anything in a COW mapping
mm/gup: reliable R/O long-term pinning in COW mappings
RDMA/umem: remove FOLL_FORCE usage
RDMA/usnic: remove FOLL_FORCE usage
RDMA/siw: remove FOLL_FORCE usage
media: videobuf-dma-sg: remove FOLL_FORCE usage
drm/etnaviv: remove FOLL_FORCE usage
media: pci/ivtv: remove FOLL_FORCE usage
mm/frame-vector: remove FOLL_FORCE usage
drm/exynos: remove FOLL_FORCE usage
RDMA/hw/qib/qib_user_pages: remove FOLL_FORCE usage
habanalabs: remove FOLL_FORCE usage
mm: rename FOLL_FORCE to FOLL_PTRACE
arch/alpha/kernel/ptrace.c | 6 +-
arch/arm64/kernel/mte.c | 2 +-
arch/ia64/kernel/ptrace.c | 10 +-
arch/mips/kernel/ptrace32.c | 4 +-
arch/mips/math-emu/dsemul.c | 2 +-
arch/powerpc/kernel/ptrace/ptrace32.c | 4 +-
arch/sparc/kernel/ptrace_32.c | 4 +-
arch/sparc/kernel/ptrace_64.c | 8 +-
arch/x86/kernel/step.c | 2 +-
arch/x86/um/ptrace_32.c | 2 +-
arch/x86/um/ptrace_64.c | 2 +-
drivers/gpu/drm/etnaviv/etnaviv_gem.c | 8 +-
drivers/gpu/drm/exynos/exynos_drm_g2d.c | 2 +-
drivers/infiniband/core/umem.c | 8 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 2 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 9 +-
drivers/infiniband/sw/siw/siw_mem.c | 9 +-
drivers/media/common/videobuf2/frame_vector.c | 2 +-
drivers/media/pci/ivtv/ivtv-udma.c | 2 +-
drivers/media/pci/ivtv/ivtv-yuv.c | 5 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 14 +-
drivers/misc/habanalabs/common/memory.c | 3 +-
fs/exec.c | 2 +-
fs/proc/base.c | 2 +-
include/linux/mm.h | 35 +-
include/linux/mm_types.h | 8 +-
kernel/events/uprobes.c | 4 +-
kernel/ptrace.c | 12 +-
mm/gup.c | 38 +-
mm/huge_memory.c | 13 +-
mm/hugetlb.c | 14 +-
mm/memory.c | 97 +++--
mm/util.c | 4 +-
security/tomoyo/domain.c | 2 +-
tools/testing/selftests/vm/.gitignore | 2 +-
tools/testing/selftests/vm/Makefile | 10 +-
tools/testing/selftests/vm/check_config.sh | 4 +-
.../selftests/vm/{anon_cow.c => cow.c} | 387 +++++++++++++++++-
tools/testing/selftests/vm/run_vmtests.sh | 8 +-
39 files changed, 575 insertions(+), 177 deletions(-)
rename tools/testing/selftests/vm/{anon_cow.c => cow.c} (75%)
--
2.38.1
Powered by blists - more mailing lists