linux-kernel - Re: [RFC V2 00/37] Enhance memory utilization with DMEMFS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <33a1c4ca-9f78-96ca-a774-3adea64aaed3@redhat.com>
Date:   Mon, 7 Dec 2020 13:02:46 +0100
From:   David Hildenbrand <david@...hat.com>
To:     yulei.kernel@...il.com, linux-mm@...ck.org,
        akpm@...ux-foundation.org, linux-fsdevel@...r.kernel.org,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        naoya.horiguchi@....com, viro@...iv.linux.org.uk,
        pbonzini@...hat.com, Dan Williams <dan.j.williams@...el.com>
Cc:     joao.m.martins@...cle.com, rdunlap@...radead.org,
        sean.j.christopherson@...el.com, xiaoguangrong.eric@...il.com,
        kernellwp@...il.com, lihaiwei.kernel@...il.com,
        Yulei Zhang <yuleixzhang@...cent.com>
Subject: Re: [RFC V2 00/37] Enhance memory utilization with DMEMFS

On 07.12.20 12:30, yulei.kernel@...il.com wrote:
> From: Yulei Zhang <yuleixzhang@...cent.com>
> 
> In current system each physical memory page is assocaited with
> a page structure which is used to track the usage of this page.
> But due to the memory usage rapidly growing in cloud environment,
> we find the resource consuming for page structure storage becomes
> more and more remarkable. So is it possible that we could reclaim
> such memory and make it reusable?
> 
> This patchset introduces an idea about how to save the extra
> memory through a new virtual filesystem -- dmemfs.
> 
> Dmemfs (Direct Memory filesystem) is device memory or reserved
> memory based filesystem. This kind of memory is special as it
> is not managed by kernel and most important it is without 'struct page'.
> Therefore we can leverage the extra memory from the host system
> to support more tenants in our cloud service.

"is not managed by kernel" well, it's obviously is managed by the
kernel. It's not managed by the buddy ;)

How is this different to using "mem=X" and mapping the relevant memory
directly into applications? Is this "simply" a control instance on top
that makes sure unprivileged process can access it and not step onto
each others feet? Is that the reason why it's called  a "file system"?
(an example would have helped here, showing how it's used)

It's worth noting that memory hotunplug, memory poisoning and probably
more is currently fundamentally incompatible with this approach - which
should better be pointed out in the cover letter.

Also, I think something similar can be obtained by using dax/hmat
infrastructure with "memmap=", at least I remember a talk where this was
discussed (but not sure if they modified the firmware to expose selected
memory as soft-reserved - we would only need a cmdline parameter to
achieve the same - Dan might know more).

> 
> As the belowing figure shows, we uses a kernel boot parameter 'dmem='
> to reserve the system memory when the host system boots up, the
> remaining system memory is still managed by system memory management
> which is associated with "struct page", the reserved memory
> will be managed by dmem and assigned to guest system, the details
> can be checked in /Documentation/admin-guide/kernel-parameters.txt.
> 
>    +------------------+--------------------------------------+
>    |  system memory   |     memory for guest system          | 
>    +------------------+--------------------------------------+
>     |                                   |
>     v                                   |
> struct page                             |
>     |                                   |
>     v                                   v
>     system mem management             dmem  
> 
> And during the usage, the dmemfs will handle the memory request to
> allocate and free the reserved memory on each NUMA node, the user 
> space application could leverage the mmap interface to access the 
> memory, and kernel module such as kvm and vfio would be able to pin
> the memory thongh follow_pfn() and get_user_page() in different given
> page size granularities.

I cannot say that I really like this approach. I really prefer the
proposal to free-up most vmemmap pages for huge/gigantic pages instead
if all this is about is reducing the memmap size.


-- 
Thanks,

David / dhildenb