[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0dc2afaf8a976ef8eb9af711fd941f1bbfd71321.camel@mediatek.com>
Date: Wed, 13 Sep 2023 08:11:40 +0000
From: Kuan-Ying Lee (李冠穎)
<Kuan-Ying.Lee@...iatek.com>
To: "dietmar.eggemann@....com" <dietmar.eggemann@....com>,
"hughd@...gle.com" <hughd@...gle.com>,
"peterz@...radead.org" <peterz@...radead.org>,
"maz@...nel.org" <maz@...nel.org>,
"rostedt@...dmis.org" <rostedt@...dmis.org>,
"rppt@...nel.org" <rppt@...nel.org>,
"yuzenghui@...wei.com" <yuzenghui@...wei.com>,
"james.morse@....com" <james.morse@....com>,
"vschneid@...hat.com" <vschneid@...hat.com>,
"bristot@...hat.com" <bristot@...hat.com>,
"juri.lelli@...hat.com" <juri.lelli@...hat.com>,
"alexandru.elisei@....com" <alexandru.elisei@....com>,
"suzuki.poulose@....com" <suzuki.poulose@....com>,
"catalin.marinas@....com" <catalin.marinas@....com>,
"mingo@...hat.com" <mingo@...hat.com>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"mhiramat@...nel.org" <mhiramat@...nel.org>,
"bsegall@...gle.com" <bsegall@...gle.com>,
"mgorman@...e.de" <mgorman@...e.de>,
"arnd@...db.de" <arnd@...db.de>,
"oliver.upton@...ux.dev" <oliver.upton@...ux.dev>,
"vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
"will@...nel.org" <will@...nel.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-trace-kernel@...r.kernel.org"
<linux-trace-kernel@...r.kernel.org>,
Qun-wei Lin (林群崴)
<Qun-wei.Lin@...iatek.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"hyesoo.yu@...sung.com" <hyesoo.yu@...sung.com>,
"kcc@...gle.com" <kcc@...gle.com>,
"kvmarm@...ts.linux.dev" <kvmarm@...ts.linux.dev>,
"david@...hat.com" <david@...hat.com>,
Casper Li (李中榮) <casper.li@...iatek.com>,
"steven.price@....com" <steven.price@....com>,
Chinwen Chang (張錦文)
<chinwen.chang@...iatek.com>,
Kuan-Ying Lee (李冠穎)
<Kuan-Ying.Lee@...iatek.com>,
"eugenis@...gle.com" <eugenis@...gle.com>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"pcc@...gle.com" <pcc@...gle.com>,
"vincenzo.frascino@....com" <vincenzo.frascino@....com>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"anshuman.khandual@....com" <anshuman.khandual@....com>
Subject: Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage
reuse
On Wed, 2023-08-23 at 14:13 +0100, Alexandru Elisei wrote:
> Introduction
> ============
>
> Arm has implemented memory coloring in hardware, and the feature is
> called
> Memory Tagging Extensions (MTE). It works by embedding a 4 bit tag in
> bits
> 59..56 of a pointer, and storing this tag to a reserved memory
> location.
> When the pointer is dereferenced, the hardware compares the tag
> embedded in
> the pointer (logical tag) with the tag stored in memory (allocation
> tag).
>
> The relation between memory and where the tag for that memory is
> stored is
> static.
>
> The memory where the tags are stored have been so far unaccessible to
> Linux.
> This series aims to change that, by adding support for using the tag
> storage
> memory only as data memory; tag storage memory cannot be itself
> tagged.
>
>
> Implementation
> ==============
>
> The series is based on v6.5-rc3 with these two patches cherry picked:
>
> - mm: Call arch_swap_restore() from unuse_pte():
>
>
> https://lore.kernel.org/all/20230523004312.1807357-3-pcc@google.com/
>
> - arm64: mte: Simplify swap tag restoration logic:
>
>
> https://lore.kernel.org/all/20230523004312.1807357-4-pcc@google.com/
>
> The above two patches are queued for the v6.6 merge window:
>
>
> https://lore.kernel.org/all/20230702123821.04e64ea2c04dd0fdc947bda3@linux-foundation.org/
>
> The entire series, including the above patches, can be cloned with:
>
> $ git clone https://gitlab.arm.com/linux-arm/linux-ae.git \
> -b arm-mte-dynamic-carveout-rfc-v1
>
> On the arm64 architecture side, an extension is being worked on that
> will
> clarify how MTE tag storage reuse should behave. The extension will
> be
> made public soon.
>
> On the Linux side, MTE tag storage reuse is accomplished with the
> following changes:
>
> 1. The tag storage memory is exposed to the memory allocator as a new
> migratetype, MIGRATE_METADATA. It behaves similarly to MIGRATE_CMA,
> with
> the restriction that it cannot be used to allocate tagged memory (tag
> storage memory cannot be tagged). On tagged page allocation, the
> corresponding tag storage is reserved via alloc_contig_range().
>
> 2. mprotect(PROT_MTE) is implemented by changing the pte prot to
> PAGE_METADATA_NONE. When the page is next accessed, a fault is taken
> and
> the corresponding tag storage is reserved.
>
> 3. When the code tries to copy tags to a page which doesn't have the
> tag
> storage reserved, the tags are copied to an xarray and restored in
> set_pte_at(), when the page is eventually mapped with the tag storage
> reserved.
>
> KVM support has not been implemented yet, that because a non-MTE
> enabled VMA
> can back the memory of an MTE-enabled VM. After there is a consensus
> on the
> right approach on the memory management support, I will add it.
>
> Explanations for the last two changes follow. The gist of it is that
> they
> were added mostly because of races, and it my intention to make the
> code
> more robust.
>
> PAGE_METADATA_NONE was introduced to avoid races with
> mprotect(PROT_MTE).
> For example, migration can race with mprotect(PROT_MTE):
> - thread 0 initiates migration for a page in a non-MTE enabled VMA
> and a
> destination page is allocated without tag storage.
> - thread 1 handles an mprotect(PROT_MTE), the VMA becomes tagged, and
> an
> access turns the source page that is in the process of being
> migrated
> into a tagged page.
> - thread 0 finishes migration and the destination page is mapped as
> tagged,
> but without tag storage reserved.
> More details and examples can be found in the patches.
>
> This race is also related to how tag restoring is handled when tag
> storage
> is missing: when a tagged page is swapped out, the tags are saved in
> an
> xarray indexed by swp_entry.val. When a page is swapped back in, if
> there
> are tags corresponding to the swp_entry that the page will replace,
> the
> tags are unconditionally restored, even if the page will be mapped as
> untagged. Because the page will be mapped as untagged, tag storage
> was
> not reserved when the page was allocated to replace the swp_entry
> which has
> tags associated with it.
>
> To get around this, save the tags in a new xarray, this time indexed
> by
> pfn, and restore them when the same page is mapped as tagged.
>
> This also solves another race, this time with copy_highpage. In the
> scenario where migration races with mprotect(PROT_MTE), before the
> page is
> mapped, the contents of the source page is copied to the destination.
> And
> this includes tags, which will be copied to a page with missing tag
> storage, which can to data corruption if the missing tag storage is
> in use
> for data. So copy_highpage() has received a similar treatment to the
> swap
> code, and the source tags are copied in the xarray indexed by the
> destination page pfn.
>
>
> Overview of the patches
> =======================
>
> Patches 1-3 do some preparatory work by renaming a few functions and
> a gfp
> flag.
>
> Patches 4-12 are arch independent and introduce MIGRATE_METADATA to
> the
> page allocator.
>
> Patches 13-18 are arm64 specific and add support for detecting the
> tag
> storage region and onlining it with the MIGRATE_METADATA migratetype.
>
> Patches 19-24 are arch independent and modify the page allocator to
> callback into arch dependant functions to reserve metadata storage
> for an
> allocation which requires metadata.
>
> Patches 25-28 are mostly arm64 specific and implement the reservation
> and
> freeing of tag storage on tagged page allocation. Patch #28 ("mm:
> sched:
> Introduce PF_MEMALLOC_ISOLATE") adds a current flag,
> PF_MEMALLOC_ISOLATE,
> which ignores page isolation limits; this is used by arm64 when
> reserving
> tag storage in the same patch.
>
> Patches 29-30 add arch independent support for doing
> mprotect(PROT_MTE)
> when metadata storage is enabled.
>
> Patches 31-37 are mostly arm64 specific and handle the restoring of
> tags
> when tag storage is missing. The exceptions are patches 32 (adds the
> arch_swap_prepare_to_restore() function) and 35 (add
> PAGE_METADATA_NONE
> support for THPs).
>
> Testing
> =======
>
> To enable MTE dynamic tag storage:
>
> - CONFIG_ARM64_MTE_TAG_STORAGE=y
> - system_supports_mte() returns true
> - kasan_hw_tags_enabled() returns false
> - correct DTB node (for the specification, see commit "arm64: mte:
> Reserve tag
> storage memory")
>
> Check dmesg for the message "MTE tag storage enabled" or grep for
> metadata
> in /proc/vmstat.
>
> I've tested the series using FVP with MTE enabled, but without
> support for
> dynamic tag storage reuse. To simulate it, I've added two fake tag
> storage
> regions in the DTB by splitting a 2GB region roughly into 33 slices
> of size
> 0x3e0_0000, and using 32 of them for tagged memory and one slice for
> tag
> storage:
>
> diff --git a/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> b/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> index 60472d65a355..bd050373d6cf 100644
> --- a/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> +++ b/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> @@ -165,10 +165,28 @@ C1_L2: l2-cache1 {
> };
> };
>
> - memory@...00000 {
> + memory0: memory@...00000 {
> device_type = "memory";
> - reg = <0x00000000 0x80000000 0 0x80000000>,
> - <0x00000008 0x80000000 0 0x80000000>;
> + reg = <0x00 0x80000000 0x00 0x7c000000>;
> + };
> +
> + metadata0: metadata@...00000 {
> + compatible = "arm,mte-tag-storage";
> + reg = <0x00 0xfc000000 0x00 0x3e00000>;
> + block-size = <0x1000>;
> + memory = <&memory0>;
> + };
> +
> + memory1: memory@...000000 {
> + device_type = "memory";
> + reg = <0x08 0x80000000 0x00 0x7c000000>;
> + };
> +
> + metadata1: metadata@...000000 {
> + compatible = "arm,mte-tag-storage";
> + reg = <0x08 0xfc000000 0x00 0x3e00000>;
> + block-size = <0x1000>;
> + memory = <&memory1>;
> };
>
Hi Alexandru,
AFAIK, the above memory configuration means that there are two region
of dram(0x80000000-0xfc000000 and 0x8_80000000-0x8_fc0000000) and this
is called PDD memory map.
Document[1] said there are some constraints of tag memory as below.
| The following constraints apply to the tag regions in DRAM:
| 1. The tag region cannot be interleaved with the data region.
| The tag region must also be above the data region within DRAM.
|
| 2.The tag region in the physical address space cannot straddle
| multiple regions of a memory map.
|
| PDD memory map is not allowed to have part of the tag region between
| 2GB-4GB and another part between 34GB-64GB.
I'm not sure if we can separate tag memory with the above
configuration. Or do I miss something?
[1] https://developer.arm.com/documentation/101569/0300/?lang=en
(Section 5.4.6.1)
Thanks,
Kuan-Ying Lee
> reserved-memory {
>
>
> Alexandru Elisei (37):
> mm: page_alloc: Rename gfp_to_alloc_flags_cma ->
> gfp_to_alloc_flags_fast
> arm64: mte: Rework naming for tag manipulation functions
> arm64: mte: Rename __GFP_ZEROTAGS to __GFP_TAGGED
> mm: Add MIGRATE_METADATA allocation policy
> mm: Add memory statistics for the MIGRATE_METADATA allocation
> policy
> mm: page_alloc: Allocate from movable pcp lists only if
> ALLOC_FROM_METADATA
> mm: page_alloc: Bypass pcp when freeing MIGRATE_METADATA pages
> mm: compaction: Account for free metadata pages in
> __compact_finished()
> mm: compaction: Handle metadata pages as source for direct
> compaction
> mm: compaction: Do not use MIGRATE_METADATA to replace pages with
> metadata
> mm: migrate/mempolicy: Allocate metadata-enabled destination page
> mm: gup: Don't allow longterm pinning of MIGRATE_METADATA pages
> arm64: mte: Reserve tag storage memory
> arm64: mte: Expose tag storage pages to the MIGRATE_METADATA
> freelist
> arm64: mte: Make tag storage depend on ARCH_KEEP_MEMBLOCK
> arm64: mte: Move tag storage to MIGRATE_MOVABLE when MTE is
> disabled
> arm64: mte: Disable dynamic tag storage management if HW KASAN is
> enabled
> arm64: mte: Check that tag storage blocks are in the same zone
> mm: page_alloc: Manage metadata storage on page allocation
> mm: compaction: Reserve metadata storage in compaction_alloc()
> mm: khugepaged: Handle metadata-enabled VMAs
> mm: shmem: Allocate metadata storage for in-memory filesystems
> mm: Teach vma_alloc_folio() about metadata-enabled VMAs
> mm: page_alloc: Teach alloc_contig_range() about MIGRATE_METADATA
> arm64: mte: Manage tag storage on page allocation
> arm64: mte: Perform CMOs for tag blocks on tagged page
> allocation/free
> arm64: mte: Reserve tag block for the zero page
> mm: sched: Introduce PF_MEMALLOC_ISOLATE
> mm: arm64: Define the PAGE_METADATA_NONE page protection
> mm: mprotect: arm64: Set PAGE_METADATA_NONE for mprotect(PROT_MTE)
> mm: arm64: Set PAGE_METADATA_NONE in set_pte_at() if missing
> metadata
> storage
> mm: Call arch_swap_prepare_to_restore() before arch_swap_restore()
> arm64: mte: swap/copypage: Handle tag restoring when missing tag
> storage
> arm64: mte: Handle fatal signal in reserve_metadata_storage()
> mm: hugepage: Handle PAGE_METADATA_NONE faults for huge pages
> KVM: arm64: Disable MTE is tag storage is enabled
> arm64: mte: Enable tag storage management
>
> arch/arm64/Kconfig | 13 +
> arch/arm64/include/asm/assembler.h | 10 +
> arch/arm64/include/asm/memory_metadata.h | 49 ++
> arch/arm64/include/asm/mte-def.h | 16 +-
> arch/arm64/include/asm/mte.h | 40 +-
> arch/arm64/include/asm/mte_tag_storage.h | 36 ++
> arch/arm64/include/asm/page.h | 5 +-
> arch/arm64/include/asm/pgtable-prot.h | 2 +
> arch/arm64/include/asm/pgtable.h | 33 +-
> arch/arm64/kernel/Makefile | 1 +
> arch/arm64/kernel/elfcore.c | 14 +-
> arch/arm64/kernel/hibernate.c | 46 +-
> arch/arm64/kernel/mte.c | 31 +-
> arch/arm64/kernel/mte_tag_storage.c | 667
> +++++++++++++++++++++++
> arch/arm64/kernel/setup.c | 7 +
> arch/arm64/kvm/arm.c | 6 +-
> arch/arm64/lib/mte.S | 30 +-
> arch/arm64/mm/copypage.c | 26 +
> arch/arm64/mm/fault.c | 35 +-
> arch/arm64/mm/mteswap.c | 113 +++-
> fs/proc/meminfo.c | 8 +
> fs/proc/page.c | 1 +
> include/asm-generic/Kbuild | 1 +
> include/asm-generic/memory_metadata.h | 50 ++
> include/linux/gfp.h | 10 +
> include/linux/gfp_types.h | 14 +-
> include/linux/huge_mm.h | 6 +
> include/linux/kernel-page-flags.h | 1 +
> include/linux/migrate_mode.h | 1 +
> include/linux/mm.h | 12 +-
> include/linux/mmzone.h | 26 +-
> include/linux/page-flags.h | 1 +
> include/linux/pgtable.h | 19 +
> include/linux/sched.h | 2 +-
> include/linux/sched/mm.h | 13 +
> include/linux/vm_event_item.h | 5 +
> include/linux/vmstat.h | 2 +
> include/trace/events/mmflags.h | 5 +-
> mm/Kconfig | 5 +
> mm/compaction.c | 52 +-
> mm/huge_memory.c | 109 ++++
> mm/internal.h | 7 +
> mm/khugepaged.c | 7 +
> mm/memory.c | 180 +++++-
> mm/mempolicy.c | 7 +
> mm/migrate.c | 6 +
> mm/mm_init.c | 23 +-
> mm/mprotect.c | 46 ++
> mm/page_alloc.c | 136 ++++-
> mm/page_isolation.c | 19 +-
> mm/page_owner.c | 3 +-
> mm/shmem.c | 14 +-
> mm/show_mem.c | 4 +
> mm/swapfile.c | 4 +
> mm/vmscan.c | 3 +
> mm/vmstat.c | 13 +-
> 56 files changed, 1834 insertions(+), 161 deletions(-)
> create mode 100644 arch/arm64/include/asm/memory_metadata.h
> create mode 100644 arch/arm64/include/asm/mte_tag_storage.h
> create mode 100644 arch/arm64/kernel/mte_tag_storage.c
> create mode 100644 include/asm-generic/memory_metadata.h
>
Powered by blists - more mailing lists