lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 7 Dec 2020 13:02:46 +0100
From:   David Hildenbrand <>
To:,,,,,,,,, Dan Williams <>
        Yulei Zhang <>
Subject: Re: [RFC V2 00/37] Enhance memory utilization with DMEMFS

On 07.12.20 12:30, wrote:
> From: Yulei Zhang <>
> In current system each physical memory page is assocaited with
> a page structure which is used to track the usage of this page.
> But due to the memory usage rapidly growing in cloud environment,
> we find the resource consuming for page structure storage becomes
> more and more remarkable. So is it possible that we could reclaim
> such memory and make it reusable?
> This patchset introduces an idea about how to save the extra
> memory through a new virtual filesystem -- dmemfs.
> Dmemfs (Direct Memory filesystem) is device memory or reserved
> memory based filesystem. This kind of memory is special as it
> is not managed by kernel and most important it is without 'struct page'.
> Therefore we can leverage the extra memory from the host system
> to support more tenants in our cloud service.

"is not managed by kernel" well, it's obviously is managed by the
kernel. It's not managed by the buddy ;)

How is this different to using "mem=X" and mapping the relevant memory
directly into applications? Is this "simply" a control instance on top
that makes sure unprivileged process can access it and not step onto
each others feet? Is that the reason why it's called  a "file system"?
(an example would have helped here, showing how it's used)

It's worth noting that memory hotunplug, memory poisoning and probably
more is currently fundamentally incompatible with this approach - which
should better be pointed out in the cover letter.

Also, I think something similar can be obtained by using dax/hmat
infrastructure with "memmap=", at least I remember a talk where this was
discussed (but not sure if they modified the firmware to expose selected
memory as soft-reserved - we would only need a cmdline parameter to
achieve the same - Dan might know more).

> As the belowing figure shows, we uses a kernel boot parameter 'dmem='
> to reserve the system memory when the host system boots up, the
> remaining system memory is still managed by system memory management
> which is associated with "struct page", the reserved memory
> will be managed by dmem and assigned to guest system, the details
> can be checked in /Documentation/admin-guide/kernel-parameters.txt.
>    +------------------+--------------------------------------+
>    |  system memory   |     memory for guest system          | 
>    +------------------+--------------------------------------+
>     |                                   |
>     v                                   |
> struct page                             |
>     |                                   |
>     v                                   v
>     system mem management             dmem  
> And during the usage, the dmemfs will handle the memory request to
> allocate and free the reserved memory on each NUMA node, the user 
> space application could leverage the mmap interface to access the 
> memory, and kernel module such as kvm and vfio would be able to pin
> the memory thongh follow_pfn() and get_user_page() in different given
> page size granularities.

I cannot say that I really like this approach. I really prefer the
proposal to free-up most vmemmap pages for huge/gigantic pages instead
if all this is about is reducing the memmap size.


David / dhildenb

Powered by blists - more mailing lists