[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b3b90a8e-16e9-a314-8531-e225f8a52817@redhat.com>
Date: Sun, 18 Dec 2022 10:59:49 +0100
From: David Hildenbrand <david@...hat.com>
To: Huacai Chen <chenhuacai@...nel.org>
Cc: linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Hugh Dickins <hughd@...gle.com>,
John Hubbard <jhubbard@...dia.com>,
Jason Gunthorpe <jgg@...dia.com>,
Mike Rapoport <rppt@...ux.ibm.com>,
Yang Shi <shy828301@...il.com>,
Vlastimil Babka <vbabka@...e.cz>,
Nadav Amit <namit@...are.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Peter Xu <peterx@...hat.com>, linux-mm@...ck.org,
x86@...nel.org, linux-alpha@...r.kernel.org,
linux-snps-arc@...ts.infradead.org,
linux-arm-kernel@...ts.infradead.org, linux-csky@...r.kernel.org,
linux-hexagon@...r.kernel.org, linux-ia64@...r.kernel.org,
loongarch@...ts.linux.dev, linux-m68k@...ts.linux-m68k.org,
linux-mips@...r.kernel.org, openrisc@...ts.librecores.org,
linux-parisc@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
linux-riscv@...ts.infradead.org, linux-s390@...r.kernel.org,
linux-sh@...r.kernel.org, sparclinux@...r.kernel.org,
linux-um@...ts.infradead.org, linux-xtensa@...ux-xtensa.org,
Albert Ou <aou@...s.berkeley.edu>,
Anton Ivanov <anton.ivanov@...bridgegreys.com>,
Borislav Petkov <bp@...en8.de>, Brian Cain <bcain@...cinc.com>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Chris Zankel <chris@...kel.net>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"David S. Miller" <davem@...emloft.net>,
Dinh Nguyen <dinguyen@...nel.org>,
Geert Uytterhoeven <geert@...ux-m68k.org>,
Greg Ungerer <gerg@...ux-m68k.org>,
Guo Ren <guoren@...nel.org>, Helge Deller <deller@....de>,
"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
Ivan Kokshaysky <ink@...assic.park.msu.ru>,
"James E.J. Bottomley" <James.Bottomley@...senpartnership.com>,
Johannes Berg <johannes@...solutions.net>,
Matt Turner <mattst88@...il.com>,
Max Filippov <jcmvbkbc@...il.com>,
Michael Ellerman <mpe@...erman.id.au>,
Michal Simek <monstr@...str.eu>,
Nicholas Piggin <npiggin@...il.com>,
Palmer Dabbelt <palmer@...belt.com>,
Paul Walmsley <paul.walmsley@...ive.com>,
Richard Henderson <richard.henderson@...aro.org>,
Richard Weinberger <richard@....at>,
Rich Felker <dalias@...c.org>,
Russell King <linux@...linux.org.uk>,
Stafford Horne <shorne@...il.com>,
Stefan Kristiansson <stefan.kristiansson@...nalahti.fi>,
Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
Thomas Gleixner <tglx@...utronix.de>,
Vineet Gupta <vgupta@...nel.org>,
WANG Xuerui <kernel@...0n.name>,
Yoshinori Sato <ysato@...rs.sourceforge.jp>
Subject: Re: [PATCH mm-unstable RFC 00/26] mm: support
__HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs
On 18.12.22 04:32, Huacai Chen wrote:
> Hi, David,
>
> What is the opposite of exclusive here? Shared or inclusive? I prefer
> pte_swp_mkshared() or pte_swp_mkinclusive() rather than
> pte_swp_clear_exclusive(). Existing examples: dirty/clean, young/old
> ...
Hi Huacai,
thanks for having a look!
Please note that this series doesn't add these primitives but merely
implements them on all remaining architectures.
Having that said, the semantics are "exclusive" vs. "maybe shared", not
"exclusive" vs. "shared" or sth. else. It would have to be
pte_swp_mkmaybe_shared().
Note that this naming matches just the way we handle it for the other
pte_swp_ flags we have, namely:
pte_swp_mksoft_dirty()
pte_swp_soft_dirty()
pte_swp_clear_soft_dirty()
and
pte_swp_mkuffd_wp()
pte_swp_uffd_wp()
pte_swp_clear_uffd_wp()
For example, we also (thankfully) didn't call it pte_mksoft_clean().
Grepping for "pte_swp.*soft_dirty" gives you the full picture.
Thanks!
David
>
> Huacai
>
> On Tue, Dec 6, 2022 at 10:48 PM David Hildenbrand <david@...hat.com> wrote:
>>
>> This is the follow-up on [1]:
>> [PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of
>> anonymous pages
>>
>> After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent
>> enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all
>> remaining architectures that support swap PTEs.
>>
>> This makes sure that exclusive anonymous pages will stay exclusive, even
>> after they were swapped out -- for example, making GUP R/W FOLL_GET of
>> anonymous pages reliable. Details can be found in [1].
>>
>> This primarily fixes remaining known O_DIRECT memory corruptions that can
>> happen on concurrent swapout, whereby we can lose DMA reads to a page
>> (modifying the user page by writing to it).
>>
>> To verify, there are two test cases (requiring swap space, obviously):
>> (1) The O_DIRECT+swapout test case [2] from Andrea. This test case tries
>> triggering a race condition.
>> (2) My vmsplice() test case [3] that tries to detect if the exclusive
>> marker was lost during swapout, not relying on a race condition.
>>
>>
>> For example, on 32bit x86 (with and without PAE), my test case fails
>> without these patches:
>> $ ./test_swp_exclusive
>> FAIL: page was replaced during COW
>> But succeeds with these patches:
>> $ ./test_swp_exclusive
>> PASS: page was not replaced during COW
>>
>>
>> Why implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures, even
>> the ones where swap support might be in a questionable state? This is the
>> first step towards removing "readable_exclusive" migration entries, and
>> instead using pte_swp_exclusive() also with (readable) migration entries
>> instead (as suggested by Peter). The only missing piece for that is
>> supporting pmd_swp_exclusive() on relevant architectures with THP
>> migration support.
>>
>> As all relevant architectures now implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE,,
>> we can drop __HAVE_ARCH_PTE_SWP_EXCLUSIVE in the last patch.
>>
>>
>> RFC because some of the swap PTE layouts are really tricky and I really
>> need some feedback related to deciphering these layouts and "using yet
>> unused PTE bits in swap PTEs". I tried cross-compiling all relevant setups
>> (phew, I might only miss some power/nohash variants), but only tested on
>> x86 so far.
>>
>> CCing arch maintainers only on this cover letter and on the respective
>> patch(es).
>>
>>
>> [1] https://lkml.kernel.org/r/20220329164329.208407-1-david@redhat.com
>> [2] https://gitlab.com/aarcange/kernel-testcases-for-v5.11/-/blob/main/page_count_do_wp_page-swap.c
>> [3] https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/test_swp_exclusive.c
>>
>> David Hildenbrand (26):
>> mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks
>> alpha/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> arc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> arm/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> csky/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> hexagon/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> ia64/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> loongarch/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> m68k/mm: remove dummy __swp definitions for nommu
>> m68k/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> mips/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> nios2/mm: refactor swap PTE layout
>> nios2/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> openrisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> parisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> powerpc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit book3s
>> powerpc/nohash/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> riscv/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> sh/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit
>> sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 64bit
>> um/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> x86/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE also on 32bit
>> xtensa/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>> mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVE
>>
>> arch/alpha/include/asm/pgtable.h | 40 ++++++++-
>> arch/arc/include/asm/pgtable-bits-arcv2.h | 26 +++++-
>> arch/arm/include/asm/pgtable-2level.h | 3 +
>> arch/arm/include/asm/pgtable-3level.h | 3 +
>> arch/arm/include/asm/pgtable.h | 34 ++++++--
>> arch/arm64/include/asm/pgtable.h | 1 -
>> arch/csky/abiv1/inc/abi/pgtable-bits.h | 13 ++-
>> arch/csky/abiv2/inc/abi/pgtable-bits.h | 19 ++--
>> arch/csky/include/asm/pgtable.h | 17 ++++
>> arch/hexagon/include/asm/pgtable.h | 36 ++++++--
>> arch/ia64/include/asm/pgtable.h | 31 ++++++-
>> arch/loongarch/include/asm/pgtable-bits.h | 4 +
>> arch/loongarch/include/asm/pgtable.h | 38 +++++++-
>> arch/m68k/include/asm/mcf_pgtable.h | 35 +++++++-
>> arch/m68k/include/asm/motorola_pgtable.h | 37 +++++++-
>> arch/m68k/include/asm/pgtable_no.h | 6 --
>> arch/m68k/include/asm/sun3_pgtable.h | 38 +++++++-
>> arch/microblaze/include/asm/pgtable.h | 44 +++++++---
>> arch/mips/include/asm/pgtable-32.h | 86 ++++++++++++++++---
>> arch/mips/include/asm/pgtable-64.h | 23 ++++-
>> arch/mips/include/asm/pgtable.h | 35 ++++++++
>> arch/nios2/include/asm/pgtable-bits.h | 3 +
>> arch/nios2/include/asm/pgtable.h | 37 ++++++--
>> arch/openrisc/include/asm/pgtable.h | 40 +++++++--
>> arch/parisc/include/asm/pgtable.h | 40 ++++++++-
>> arch/powerpc/include/asm/book3s/32/pgtable.h | 37 ++++++--
>> arch/powerpc/include/asm/book3s/64/pgtable.h | 1 -
>> arch/powerpc/include/asm/nohash/32/pgtable.h | 22 +++--
>> arch/powerpc/include/asm/nohash/32/pte-40x.h | 6 +-
>> arch/powerpc/include/asm/nohash/32/pte-44x.h | 18 +---
>> arch/powerpc/include/asm/nohash/32/pte-85xx.h | 4 +-
>> arch/powerpc/include/asm/nohash/64/pgtable.h | 24 +++++-
>> arch/powerpc/include/asm/nohash/pgtable.h | 15 ++++
>> arch/powerpc/include/asm/nohash/pte-e500.h | 1 -
>> arch/riscv/include/asm/pgtable-bits.h | 3 +
>> arch/riscv/include/asm/pgtable.h | 28 ++++--
>> arch/s390/include/asm/pgtable.h | 1 -
>> arch/sh/include/asm/pgtable_32.h | 53 +++++++++---
>> arch/sparc/include/asm/pgtable_32.h | 26 +++++-
>> arch/sparc/include/asm/pgtable_64.h | 37 +++++++-
>> arch/sparc/include/asm/pgtsrmmu.h | 14 +--
>> arch/um/include/asm/pgtable.h | 36 +++++++-
>> arch/x86/include/asm/pgtable-2level.h | 26 ++++--
>> arch/x86/include/asm/pgtable-3level.h | 26 +++++-
>> arch/x86/include/asm/pgtable.h | 3 -
>> arch/xtensa/include/asm/pgtable.h | 31 +++++--
>> include/linux/pgtable.h | 29 -------
>> mm/debug_vm_pgtable.c | 25 +++++-
>> mm/memory.c | 4 -
>> mm/rmap.c | 11 ---
>> 50 files changed, 943 insertions(+), 227 deletions(-)
>>
>> --
>> 2.38.1
>>
>>
>
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists