lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cover.1681430907.git.ackerleytng@google.com>
Date:   Fri, 14 Apr 2023 00:11:49 +0000
From:   Ackerley Tng <ackerleytng@...gle.com>
To:     kvm@...r.kernel.org, linux-api@...r.kernel.org,
        linux-arch@...r.kernel.org, linux-doc@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, qemu-devel@...gnu.org
Cc:     aarcange@...hat.com, ak@...ux.intel.com, akpm@...ux-foundation.org,
        arnd@...db.de, bfields@...ldses.org, bp@...en8.de,
        chao.p.peng@...ux.intel.com, corbet@....net, dave.hansen@...el.com,
        david@...hat.com, ddutile@...hat.com, dhildenb@...hat.com,
        hpa@...or.com, hughd@...gle.com, jlayton@...nel.org,
        jmattson@...gle.com, joro@...tes.org, jun.nakajima@...el.com,
        kirill.shutemov@...ux.intel.com, linmiaohe@...wei.com,
        luto@...nel.org, mail@...iej.szmigiero.name, mhocko@...e.com,
        michael.roth@....com, mingo@...hat.com, naoya.horiguchi@....com,
        pbonzini@...hat.com, qperret@...gle.com, rppt@...nel.org,
        seanjc@...gle.com, shuah@...nel.org, steven.price@....com,
        tabba@...gle.com, tglx@...utronix.de, vannapurve@...gle.com,
        vbabka@...e.cz, vkuznets@...hat.com, wanpengli@...cent.com,
        wei.w.wang@...el.com, x86@...nel.org, yu.c.zhang@...ux.intel.com,
        muchun.song@...ux.dev, feng.tang@...el.com, brgerst@...il.com,
        rdunlap@...radead.org, masahiroy@...nel.org,
        mailhol.vincent@...adoo.fr, Ackerley Tng <ackerleytng@...gle.com>
Subject: [RFC PATCH 0/6] Setting memory policy for restrictedmem file

Hello,

This patchset builds upon the memfd_restricted() system call that was
discussed in the 'KVM: mm: fd-based approach for supporting KVM' patch
series [1].

The tree can be found at:
https://github.com/googleprodkernel/linux-cc/tree/restrictedmem-set-memory-policy

In this patchset, a new syscall is introduced, which allows userspace
to set the memory policy (e.g. NUMA bindings) for a restrictedmem
file, to the granularity of offsets within the file.

The offset/length tuple is termed a file_range which is passed to the
kernel via a pointer to get around the limit of 6 arguments for a
syscall.

The following other approaches were also considered:

1. Pre-configuring a mount with a memory policy and providing that
   mount to memfd_restricted() as proposed at [2].
    + Pro: It allows choice of a specific backing mount with custom
      memory policy configurations
    + Con: Will need to create an entire new mount just to set memory
      policy for a restrictedmem file; files on the same mount cannot
      have different memory policies.

2. Passing memory policy to the memfd_restricted() syscall at creation time.
    + Pro: Only need to make a single syscall to create a file with a
      given memory policy
    + Con: At creation time, the kernel doesn’t know the size of the
      restrictedmem file. Given that memory policy is stored in the
      inode based on ranges (start, end), it is awkward for the kernel
      to store the memory policy and then add hooks to set the memory
      policy when allocation is done.

3. A more generic fbind(): it seems like this new functionality is
   really only needed for restrictedmem files, hence a separate,
   specific syscall was proposed to avoid complexities with handling
   conflicting policies that may be specified via other syscalls like
   mbind()

TODOs

+ Return -EINVAL if file_range is not within the size of the file and
  tests for this

Dependencies:

+ Chao’s work on UPM [3]

[1] https://lore.kernel.org/lkml/20221202061347.1070246-1-chao.p.peng@linux.intel.com/T/
[2] https://lore.kernel.org/lkml/cover.1681176340.git.ackerleytng@google.com/T/
[3] https://github.com/chao-p/linux/commits/privmem-v11.5

---

Ackerley Tng (6):
  mm: shmem: Refactor out shmem_shared_policy() function
  mm: mempolicy: Refactor out mpol_init_from_nodemask
  mm: mempolicy: Refactor out __mpol_set_shared_policy()
  mm: mempolicy: Add and expose mpol_create
  mm: restrictedmem: Add memfd_restricted_bind() syscall
  selftests: mm: Add selftest for memfd_restricted_bind()

 arch/x86/entry/syscalls/syscall_32.tbl        |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
 include/linux/mempolicy.h                     |   4 +
 include/linux/shmem_fs.h                      |   7 +
 include/linux/syscalls.h                      |   5 +
 include/uapi/asm-generic/unistd.h             |   5 +-
 include/uapi/linux/mempolicy.h                |   7 +-
 kernel/sys_ni.c                               |   1 +
 mm/mempolicy.c                                | 100 ++++++++++---
 mm/restrictedmem.c                            |  75 ++++++++++
 mm/shmem.c                                    |  10 +-
 scripts/checksyscalls.sh                      |   1 +
 tools/testing/selftests/mm/.gitignore         |   1 +
 tools/testing/selftests/mm/Makefile           |   8 +
 .../selftests/mm/memfd_restricted_bind.c      | 139 ++++++++++++++++++
 .../mm/restrictedmem_testmod/Makefile         |  21 +++
 .../restrictedmem_testmod.c                   |  89 +++++++++++
 tools/testing/selftests/mm/run_vmtests.sh     |   6 +
 18 files changed, 454 insertions(+), 27 deletions(-)
 create mode 100644 tools/testing/selftests/mm/memfd_restricted_bind.c
 create mode 100644 tools/testing/selftests/mm/restrictedmem_testmod/Makefile
 create mode 100644 tools/testing/selftests/mm/restrictedmem_testmod/restrictedmem_testmod.c

--
2.40.0.634.g4ca3ef3211-goog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ