lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKbZUD25mwVXowDcN1Cj5Op9wRAopYhYZcesR0tk2r_Wn-d95g@mail.gmail.com>
Date:   Sat, 2 Dec 2023 14:50:37 +0000
From:   Pedro Falcato <pedro.falcato@...il.com>
To:     David Hildenbrand <david@...hat.com>
Cc:     Weixi Zhu <weixi.zhu@...wei.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
        weixi.zhu@...neuler.sh, mgorman@...e.de, jglisse@...hat.com,
        rcampbell@...dia.com, jhubbard@...dia.com, apopple@...dia.com,
        mhairgrove@...dia.com, ziy@...dia.com, alexander.deucher@....com,
        christian.koenig@....com, Xinhui.Pan@....com,
        amd-gfx@...ts.freedesktop.org, Felix.Kuehling@....com,
        ogabbay@...nel.org, dri-devel@...ts.freedesktop.org,
        jgg@...dia.com, leonro@...dia.com, zhenyuw@...ux.intel.com,
        zhi.a.wang@...el.com, intel-gvt-dev@...ts.freedesktop.org,
        intel-gfx@...ts.freedesktop.org, jani.nikula@...ux.intel.com,
        joonas.lahtinen@...ux.intel.com, rodrigo.vivi@...el.com,
        tvrtko.ursulin@...ux.intel.com
Subject: Re: [RFC PATCH 2/6] mm/gmem: add arch-independent abstraction to
 track address mapping status

On Fri, Dec 1, 2023 at 9:23 AM David Hildenbrand <david@...hat.com> wrote:
>
> On 28.11.23 13:50, Weixi Zhu wrote:
> > This patch adds an abstraction layer, struct vm_object, that maintains
> > per-process virtual-to-physical mapping status stored in struct gm_mapping.
> > For example, a virtual page may be mapped to a CPU physical page or to a
> > device physical page. Struct vm_object effectively maintains an
> > arch-independent page table, which is defined as a "logical page table".
> > While arch-dependent page table used by a real MMU is named a "physical
> > page table". The logical page table is useful if Linux core MM is extended
> > to handle a unified virtual address space with external accelerators using
> > customized MMUs.
>
> Which raises the question why we are dealing with anonymous memory at
> all? Why not go for shmem if you are already only special-casing VMAs
> with a MMAP flag right now?
>
> That would maybe avoid having to introduce controversial BSD design
> concepts into Linux, that feel like going a step backwards in time to me
> and adding *more* MM complexity.
>
> >
> > In this patch, struct vm_object utilizes a radix
> > tree (xarray) to track where a virtual page is mapped to. This adds extra
> > memory consumption from xarray, but provides a nice abstraction to isolate
> > mapping status from the machine-dependent layer (PTEs). Besides supporting
> > accelerators with external MMUs, struct vm_object is planned to further
> > union with i_pages in struct address_mapping for file-backed memory.
>
> A file already has a tree structure (pagecache) to manage the pages that
> are theoretically mapped. It's easy to translate from a VMA to a page
> inside that tree structure that is currently not present in page tables.
>
> Why the need for that tree structure if you can just remove anon memory
> from the picture?
>
> >
> > The idea of struct vm_object is originated from FreeBSD VM design, which
> > provides a unified abstraction for anonymous memory, file-backed memory,
> > page cache and etc[1].
>
> :/
>
> > Currently, Linux utilizes a set of hierarchical page walk functions to
> > abstract page table manipulations of different CPU architecture. The
> > problem happens when a device wants to reuse Linux MM code to manage its
> > page table -- the device page table may not be accessible to the CPU.
> > Existing solution like Linux HMM utilizes the MMU notifier mechanisms to
> > invoke device-specific MMU functions, but relies on encoding the mapping
> > status on the CPU page table entries. This entangles machine-independent
> > code with machine-dependent code, and also brings unnecessary restrictions.
>
> Why? we have primitives to walk arch page tables in a non-arch specific
> fashion and are using them all over the place.
>
> We even have various mechanisms to map something into the page tables
> and get the CPU to fault on it, as if it is inaccessible (PROT_NONE as
> used for NUMA balancing, fake swap entries).
>
> > The PTE size and format vary arch by arch, which harms the extensibility.
>
> Not really.
>
> We might have some features limited to some architectures because of the
> lack of PTE bits. And usually the problem is that people don't care
> enough about enabling these features on older architectures.
>
> If we ever *really* need more space for sw-defined data, it would be
> possible to allocate auxiliary data for page tables only where required
> (where the features apply), instead of crafting a completely new,
> auxiliary datastructure with it's own locking.
>
> So far it was not required to enable the feature we need on the
> architectures we care about.
>
> >
> > [1] https://docs.freebsd.org/en/articles/vm-design/
>
> In the cover letter you have:
>
> "The future plan of logical page table is to provide a generic
> abstraction layer that support common anonymous memory (I am looking at
> you, transparent huge pages) and file-backed memory."
>
> Which I doubt will happen; there is little interest in making anonymous
> memory management slower, more serialized, and wasting more memory on
> metadata.

Also worth noting that:

1) Mach VM (which FreeBSD inherited, from the old BSD) vm_objects
aren't quite what's being stated here, rather they are somewhat
replacements for both anon_vma and address_space[1]. Very similarly to
Linux, they take pages from vm_objects and map them in page tables
using pmap (the big difference is anon memory, which has its
bookkeeping in page tables, on Linux)

2) These vm_objects were a horrendous mistake (see CoW chaining) and
FreeBSD has to go to horrendous lengths to make them tolerable. The
UVM paper/dissertation (by Charles Cranor) talks about these issues at
length, and 20 years later it's still true.

3) Despite Linux MM having its warts, it's probably correct to
consider it a solid improvement over FreeBSD MM or NetBSD UVM

And, finally, randomly tacking on core MM concepts from other systems
is at best a *really weird* idea. Particularly when they aren't even
what was stated!

[1] If you really can't use PTEs, I don't see how you can't use file
mappings and/or some vm_operations_struct workarounds, when the
patch's vm_object is literally just an xarray with a different name

-- 
Pedro

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ