lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0dc2afaf8a976ef8eb9af711fd941f1bbfd71321.camel@mediatek.com>
Date:   Wed, 13 Sep 2023 08:11:40 +0000
From:   Kuan-Ying Lee (李冠穎) 
        <Kuan-Ying.Lee@...iatek.com>
To:     "dietmar.eggemann@....com" <dietmar.eggemann@....com>,
        "hughd@...gle.com" <hughd@...gle.com>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "maz@...nel.org" <maz@...nel.org>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "rppt@...nel.org" <rppt@...nel.org>,
        "yuzenghui@...wei.com" <yuzenghui@...wei.com>,
        "james.morse@....com" <james.morse@....com>,
        "vschneid@...hat.com" <vschneid@...hat.com>,
        "bristot@...hat.com" <bristot@...hat.com>,
        "juri.lelli@...hat.com" <juri.lelli@...hat.com>,
        "alexandru.elisei@....com" <alexandru.elisei@....com>,
        "suzuki.poulose@....com" <suzuki.poulose@....com>,
        "catalin.marinas@....com" <catalin.marinas@....com>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "mhiramat@...nel.org" <mhiramat@...nel.org>,
        "bsegall@...gle.com" <bsegall@...gle.com>,
        "mgorman@...e.de" <mgorman@...e.de>,
        "arnd@...db.de" <arnd@...db.de>,
        "oliver.upton@...ux.dev" <oliver.upton@...ux.dev>,
        "vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
        "will@...nel.org" <will@...nel.org>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-trace-kernel@...r.kernel.org" 
        <linux-trace-kernel@...r.kernel.org>,
        Qun-wei Lin (林群崴) 
        <Qun-wei.Lin@...iatek.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "hyesoo.yu@...sung.com" <hyesoo.yu@...sung.com>,
        "kcc@...gle.com" <kcc@...gle.com>,
        "kvmarm@...ts.linux.dev" <kvmarm@...ts.linux.dev>,
        "david@...hat.com" <david@...hat.com>,
        Casper Li (李中榮) <casper.li@...iatek.com>,
        "steven.price@....com" <steven.price@....com>,
        Chinwen Chang (張錦文) 
        <chinwen.chang@...iatek.com>,
        Kuan-Ying Lee (李冠穎) 
        <Kuan-Ying.Lee@...iatek.com>,
        "eugenis@...gle.com" <eugenis@...gle.com>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "pcc@...gle.com" <pcc@...gle.com>,
        "vincenzo.frascino@....com" <vincenzo.frascino@....com>,
        "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "anshuman.khandual@....com" <anshuman.khandual@....com>
Subject: Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage
 reuse

On Wed, 2023-08-23 at 14:13 +0100, Alexandru Elisei wrote:
> Introduction
> ============
> 
> Arm has implemented memory coloring in hardware, and the feature is
> called
> Memory Tagging Extensions (MTE). It works by embedding a 4 bit tag in
> bits
> 59..56 of a pointer, and storing this tag to a reserved memory
> location.
> When the pointer is dereferenced, the hardware compares the tag
> embedded in
> the pointer (logical tag) with the tag stored in memory (allocation
> tag).
> 
> The relation between memory and where the tag for that memory is
> stored is
> static.
> 
> The memory where the tags are stored have been so far unaccessible to
> Linux.
> This series aims to change that, by adding support for using the tag
> storage
> memory only as data memory; tag storage memory cannot be itself
> tagged.
> 
> 
> Implementation
> ==============
> 
> The series is based on v6.5-rc3 with these two patches cherry picked:
> 
> - mm: Call arch_swap_restore() from unuse_pte():
> 
>     
> https://lore.kernel.org/all/20230523004312.1807357-3-pcc@google.com/
> 
> - arm64: mte: Simplify swap tag restoration logic:
> 
>     
> https://lore.kernel.org/all/20230523004312.1807357-4-pcc@google.com/
> 
> The above two patches are queued for the v6.6 merge window:
> 
>     
> https://lore.kernel.org/all/20230702123821.04e64ea2c04dd0fdc947bda3@linux-foundation.org/
> 
> The entire series, including the above patches, can be cloned with:
> 
> $ git clone https://gitlab.arm.com/linux-arm/linux-ae.git \
> 	-b arm-mte-dynamic-carveout-rfc-v1
> 
> On the arm64 architecture side, an extension is being worked on that
> will
> clarify how MTE tag storage reuse should behave. The extension will
> be
> made public soon.
> 
> On the Linux side, MTE tag storage reuse is accomplished with the
> following changes:
> 
> 1. The tag storage memory is exposed to the memory allocator as a new
> migratetype, MIGRATE_METADATA. It behaves similarly to MIGRATE_CMA,
> with
> the restriction that it cannot be used to allocate tagged memory (tag
> storage memory cannot be tagged). On tagged page allocation, the
> corresponding tag storage is reserved via alloc_contig_range().
> 
> 2. mprotect(PROT_MTE) is implemented by changing the pte prot to
> PAGE_METADATA_NONE. When the page is next accessed, a fault is taken
> and
> the corresponding tag storage is reserved.
> 
> 3. When the code tries to copy tags to a page which doesn't have the
> tag
> storage reserved, the tags are copied to an xarray and restored in
> set_pte_at(), when the page is eventually mapped with the tag storage
> reserved.
> 
> KVM support has not been implemented yet, that because a non-MTE
> enabled VMA
> can back the memory of an MTE-enabled VM. After there is a consensus
> on the
> right approach on the memory management support, I will add it.
> 
> Explanations for the last two changes follow. The gist of it is that
> they
> were added mostly because of races, and it my intention to make the
> code
> more robust.
> 
> PAGE_METADATA_NONE was introduced to avoid races with
> mprotect(PROT_MTE).
> For example, migration can race with mprotect(PROT_MTE):
> - thread 0 initiates migration for a page in a non-MTE enabled VMA
> and a
>   destination page is allocated without tag storage.
> - thread 1 handles an mprotect(PROT_MTE), the VMA becomes tagged, and
> an
>   access turns the source page that is in the process of being
> migrated
>   into a tagged page.
> - thread 0 finishes migration and the destination page is mapped as
> tagged,
>   but without tag storage reserved.
> More details and examples can be found in the patches.
> 
> This race is also related to how tag restoring is handled when tag
> storage
> is missing: when a tagged page is swapped out, the tags are saved in
> an
> xarray indexed by swp_entry.val. When a page is swapped back in, if
> there
> are tags corresponding to the swp_entry that the page will replace,
> the
> tags are unconditionally restored, even if the page will be mapped as
> untagged. Because the page will be mapped as untagged, tag storage
> was
> not reserved when the page was allocated to replace the swp_entry
> which has
> tags associated with it.
> 
> To get around this, save the tags in a new xarray, this time indexed
> by
> pfn, and restore them when the same page is mapped as tagged.
> 
> This also solves another race, this time with copy_highpage. In the
> scenario where migration races with mprotect(PROT_MTE), before the
> page is
> mapped, the contents of the source page is copied to the destination.
> And
> this includes tags, which will be copied to a page with missing tag
> storage, which can to data corruption if the missing tag storage is
> in use
> for data. So copy_highpage() has received a similar treatment to the
> swap
> code, and the source tags are copied in the xarray indexed by the
> destination page pfn.
> 
> 
> Overview of the patches
> =======================
> 
> Patches 1-3 do some preparatory work by renaming a few functions and
> a gfp
> flag.
> 
> Patches 4-12 are arch independent and introduce MIGRATE_METADATA to
> the
> page allocator.
> 
> Patches 13-18 are arm64 specific and add support for detecting the
> tag
> storage region and onlining it with the MIGRATE_METADATA migratetype.
> 
> Patches 19-24 are arch independent and modify the page allocator to
> callback into arch dependant functions to reserve metadata storage
> for an
> allocation which requires metadata.
> 
> Patches 25-28 are mostly arm64 specific and implement the reservation
> and
> freeing of tag storage on tagged page allocation. Patch #28 ("mm:
> sched:
> Introduce PF_MEMALLOC_ISOLATE") adds a current flag,
> PF_MEMALLOC_ISOLATE,
> which ignores page isolation limits; this is used by arm64 when
> reserving
> tag storage in the same patch.
> 
> Patches 29-30 add arch independent support for doing
> mprotect(PROT_MTE)
> when metadata storage is enabled.
> 
> Patches 31-37 are mostly arm64 specific and handle the restoring of
> tags
> when tag storage is missing. The exceptions are patches 32 (adds the
> arch_swap_prepare_to_restore() function) and 35 (add
> PAGE_METADATA_NONE
> support for THPs).
> 
> Testing
> =======
> 
> To enable MTE dynamic tag storage:
> 
> - CONFIG_ARM64_MTE_TAG_STORAGE=y
> - system_supports_mte() returns true
> - kasan_hw_tags_enabled() returns false
> - correct DTB node (for the specification, see commit "arm64: mte:
> Reserve tag
>   storage memory")
> 
> Check dmesg for the message "MTE tag storage enabled" or grep for
> metadata
> in /proc/vmstat.
> 
> I've tested the series using FVP with MTE enabled, but without
> support for
> dynamic tag storage reuse. To simulate it, I've added two fake tag
> storage
> regions in the DTB by splitting a 2GB region roughly into 33 slices
> of size
> 0x3e0_0000, and using 32 of them for tagged memory and one slice for
> tag
> storage:
> 
> diff --git a/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> b/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> index 60472d65a355..bd050373d6cf 100644
> --- a/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> +++ b/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> @@ -165,10 +165,28 @@ C1_L2: l2-cache1 {
>                 };
>         };
>  
> -       memory@...00000 {
> +       memory0: memory@...00000 {
>                 device_type = "memory";
> -               reg = <0x00000000 0x80000000 0 0x80000000>,
> -                     <0x00000008 0x80000000 0 0x80000000>;
> +               reg = <0x00 0x80000000 0x00 0x7c000000>;
> +       };
> +
> +       metadata0: metadata@...00000  {
> +               compatible = "arm,mte-tag-storage";
> +               reg = <0x00 0xfc000000 0x00 0x3e00000>;
> +               block-size = <0x1000>;
> +               memory = <&memory0>;
> +       };
> +
> +       memory1: memory@...000000 {
> +               device_type = "memory";
> +               reg = <0x08 0x80000000 0x00 0x7c000000>;
> +       };
> +
> +       metadata1: metadata@...000000  {
> +               compatible = "arm,mte-tag-storage";
> +               reg = <0x08 0xfc000000 0x00 0x3e00000>;
> +               block-size = <0x1000>;
> +               memory = <&memory1>;
>         };
>  

Hi Alexandru,

AFAIK, the above memory configuration means that there are two region
of dram(0x80000000-0xfc000000 and 0x8_80000000-0x8_fc0000000) and this
is called PDD memory map.

Document[1] said there are some constraints of tag memory as below.

| The following constraints apply to the tag regions in DRAM:
| 1. The tag region cannot be interleaved with the data region.
| The tag region must also be above the data region within DRAM.
|
| 2.The tag region in the physical address space cannot straddle
| multiple regions of a memory map.
|
| PDD memory map is not allowed to have part of the tag region between
| 2GB-4GB and another part between 34GB-64GB.


I'm not sure if we can separate tag memory with the above
configuration. Or do I miss something?

[1] https://developer.arm.com/documentation/101569/0300/?lang=en
(Section 5.4.6.1)

Thanks,
Kuan-Ying Lee
>         reserved-memory {
> 
> 
> Alexandru Elisei (37):
>   mm: page_alloc: Rename gfp_to_alloc_flags_cma ->
>     gfp_to_alloc_flags_fast
>   arm64: mte: Rework naming for tag manipulation functions
>   arm64: mte: Rename __GFP_ZEROTAGS to __GFP_TAGGED
>   mm: Add MIGRATE_METADATA allocation policy
>   mm: Add memory statistics for the MIGRATE_METADATA allocation
> policy
>   mm: page_alloc: Allocate from movable pcp lists only if
>     ALLOC_FROM_METADATA
>   mm: page_alloc: Bypass pcp when freeing MIGRATE_METADATA pages
>   mm: compaction: Account for free metadata pages in
>     __compact_finished()
>   mm: compaction: Handle metadata pages as source for direct
> compaction
>   mm: compaction: Do not use MIGRATE_METADATA to replace pages with
>     metadata
>   mm: migrate/mempolicy: Allocate metadata-enabled destination page
>   mm: gup: Don't allow longterm pinning of MIGRATE_METADATA pages
>   arm64: mte: Reserve tag storage memory
>   arm64: mte: Expose tag storage pages to the MIGRATE_METADATA
> freelist
>   arm64: mte: Make tag storage depend on ARCH_KEEP_MEMBLOCK
>   arm64: mte: Move tag storage to MIGRATE_MOVABLE when MTE is
> disabled
>   arm64: mte: Disable dynamic tag storage management if HW KASAN is
>     enabled
>   arm64: mte: Check that tag storage blocks are in the same zone
>   mm: page_alloc: Manage metadata storage on page allocation
>   mm: compaction: Reserve metadata storage in compaction_alloc()
>   mm: khugepaged: Handle metadata-enabled VMAs
>   mm: shmem: Allocate metadata storage for in-memory filesystems
>   mm: Teach vma_alloc_folio() about metadata-enabled VMAs
>   mm: page_alloc: Teach alloc_contig_range() about MIGRATE_METADATA
>   arm64: mte: Manage tag storage on page allocation
>   arm64: mte: Perform CMOs for tag blocks on tagged page
> allocation/free
>   arm64: mte: Reserve tag block for the zero page
>   mm: sched: Introduce PF_MEMALLOC_ISOLATE
>   mm: arm64: Define the PAGE_METADATA_NONE page protection
>   mm: mprotect: arm64: Set PAGE_METADATA_NONE for mprotect(PROT_MTE)
>   mm: arm64: Set PAGE_METADATA_NONE in set_pte_at() if missing
> metadata
>     storage
>   mm: Call arch_swap_prepare_to_restore() before arch_swap_restore()
>   arm64: mte: swap/copypage: Handle tag restoring when missing tag
>     storage
>   arm64: mte: Handle fatal signal in reserve_metadata_storage()
>   mm: hugepage: Handle PAGE_METADATA_NONE faults for huge pages
>   KVM: arm64: Disable MTE is tag storage is enabled
>   arm64: mte: Enable tag storage management
> 
>  arch/arm64/Kconfig                       |  13 +
>  arch/arm64/include/asm/assembler.h       |  10 +
>  arch/arm64/include/asm/memory_metadata.h |  49 ++
>  arch/arm64/include/asm/mte-def.h         |  16 +-
>  arch/arm64/include/asm/mte.h             |  40 +-
>  arch/arm64/include/asm/mte_tag_storage.h |  36 ++
>  arch/arm64/include/asm/page.h            |   5 +-
>  arch/arm64/include/asm/pgtable-prot.h    |   2 +
>  arch/arm64/include/asm/pgtable.h         |  33 +-
>  arch/arm64/kernel/Makefile               |   1 +
>  arch/arm64/kernel/elfcore.c              |  14 +-
>  arch/arm64/kernel/hibernate.c            |  46 +-
>  arch/arm64/kernel/mte.c                  |  31 +-
>  arch/arm64/kernel/mte_tag_storage.c      | 667
> +++++++++++++++++++++++
>  arch/arm64/kernel/setup.c                |   7 +
>  arch/arm64/kvm/arm.c                     |   6 +-
>  arch/arm64/lib/mte.S                     |  30 +-
>  arch/arm64/mm/copypage.c                 |  26 +
>  arch/arm64/mm/fault.c                    |  35 +-
>  arch/arm64/mm/mteswap.c                  | 113 +++-
>  fs/proc/meminfo.c                        |   8 +
>  fs/proc/page.c                           |   1 +
>  include/asm-generic/Kbuild               |   1 +
>  include/asm-generic/memory_metadata.h    |  50 ++
>  include/linux/gfp.h                      |  10 +
>  include/linux/gfp_types.h                |  14 +-
>  include/linux/huge_mm.h                  |   6 +
>  include/linux/kernel-page-flags.h        |   1 +
>  include/linux/migrate_mode.h             |   1 +
>  include/linux/mm.h                       |  12 +-
>  include/linux/mmzone.h                   |  26 +-
>  include/linux/page-flags.h               |   1 +
>  include/linux/pgtable.h                  |  19 +
>  include/linux/sched.h                    |   2 +-
>  include/linux/sched/mm.h                 |  13 +
>  include/linux/vm_event_item.h            |   5 +
>  include/linux/vmstat.h                   |   2 +
>  include/trace/events/mmflags.h           |   5 +-
>  mm/Kconfig                               |   5 +
>  mm/compaction.c                          |  52 +-
>  mm/huge_memory.c                         | 109 ++++
>  mm/internal.h                            |   7 +
>  mm/khugepaged.c                          |   7 +
>  mm/memory.c                              | 180 +++++-
>  mm/mempolicy.c                           |   7 +
>  mm/migrate.c                             |   6 +
>  mm/mm_init.c                             |  23 +-
>  mm/mprotect.c                            |  46 ++
>  mm/page_alloc.c                          | 136 ++++-
>  mm/page_isolation.c                      |  19 +-
>  mm/page_owner.c                          |   3 +-
>  mm/shmem.c                               |  14 +-
>  mm/show_mem.c                            |   4 +
>  mm/swapfile.c                            |   4 +
>  mm/vmscan.c                              |   3 +
>  mm/vmstat.c                              |  13 +-
>  56 files changed, 1834 insertions(+), 161 deletions(-)
>  create mode 100644 arch/arm64/include/asm/memory_metadata.h
>  create mode 100644 arch/arm64/include/asm/mte_tag_storage.h
>  create mode 100644 arch/arm64/kernel/mte_tag_storage.c
>  create mode 100644 include/asm-generic/memory_metadata.h
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ