[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20231207150348.82096-1-alexghiti@rivosinc.com>
Date: Thu, 7 Dec 2023 16:03:44 +0100
From: Alexandre Ghiti <alexghiti@...osinc.com>
To: Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
Michael Ellerman <mpe@...erman.id.au>,
Nicholas Piggin <npiggin@...il.com>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Paul Walmsley <paul.walmsley@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>,
Albert Ou <aou@...s.berkeley.edu>,
Andrew Morton <akpm@...ux-foundation.org>,
Ved Shanbhogue <ved@...osinc.com>,
Matt Evans <mev@...osinc.com>,
Dylan Jhong <dylan@...estech.com>,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-mips@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
linux-riscv@...ts.infradead.org, linux-mm@...ck.org
Cc: Alexandre Ghiti <alexghiti@...osinc.com>
Subject: [PATCH RFC/RFT 0/4] Remove preventive sfence.vma
In RISC-V, after a new mapping is established, a sfence.vma needs to be
emitted for different reasons:
- if the uarch caches invalid entries, we need to invalidate it otherwise
we would trap on this invalid entry,
- if the uarch does not cache invalid entries, a reordered access could fail
to see the new mapping and then trap (sfence.vma acts as a fence).
We can actually avoid emitting those (mostly) useless and costly sfence.vma
by handling the traps instead:
- for new kernel mappings: only vmalloc mappings need to be taken care of,
other new mapping are rare and already emit the required sfence.vma if
needed.
That must be achieved very early in the exception path as explained in
patch 1, and this also fixes our fragile way of dealing with vmalloc faults.
- for new user mappings: that can be handled in the page fault path as done
in patch 3.
Patch 2 is certainly a TEMP patch which allows to detect at runtime if a
uarch caches invalid TLB entries.
Patch 4 is a TEMP patch which allows to expose through debugfs the different
sfence.vma that are emitted, which can be used for benchmarking.
On our uarch that does not cache invalid entries and a 6.5 kernel, the
gains are measurable:
* Kernel boot: 6%
* ltp - mmapstress01: 8%
* lmbench - lat_pagefault: 20%
* lmbench - lat_mmap: 5%
On uarchs that cache invalid entries, the results are more mitigated and
need to be explored more thoroughly (if anyone is interested!): that can
be explained by the extra page faults, which depending on "how much" the
uarch caches invalid entries, could kill the benefits of removing the
preventive sfence.vma.
Ved Shanbhogue has prepared a new extension to be used by uarchs that do
not cache invalid entries, which will certainly be used instead of patch 2.
Thanks to Ved and Matt Evans for triggering the discussion that led to
this patchset!
That's an RFC, so please don't mind the checkpatch warnings and dirty
comments. It applies on 6.6.
Any feedback, test or relevant benchmark are welcome :)
Alexandre Ghiti (4):
riscv: Stop emitting preventive sfence.vma for new vmalloc mappings
riscv: Add a runtime detection of invalid TLB entries caching
riscv: Stop emitting preventive sfence.vma for new userspace mappings
TEMP: riscv: Add debugfs interface to retrieve #sfence.vma
arch/arm64/include/asm/pgtable.h | 2 +-
arch/mips/include/asm/pgtable.h | 6 +-
arch/powerpc/include/asm/book3s/64/tlbflush.h | 8 +-
arch/riscv/include/asm/cacheflush.h | 19 ++-
arch/riscv/include/asm/pgtable.h | 45 ++++---
arch/riscv/include/asm/thread_info.h | 5 +
arch/riscv/include/asm/tlbflush.h | 4 +
arch/riscv/kernel/asm-offsets.c | 5 +
arch/riscv/kernel/entry.S | 94 +++++++++++++
arch/riscv/kernel/sbi.c | 12 ++
arch/riscv/mm/init.c | 126 ++++++++++++++++++
arch/riscv/mm/tlbflush.c | 17 +++
include/linux/pgtable.h | 8 +-
mm/memory.c | 12 +-
14 files changed, 331 insertions(+), 32 deletions(-)
--
2.39.2
Powered by blists - more mailing lists