[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <33a1c4ca-9f78-96ca-a774-3adea64aaed3@redhat.com>
Date: Mon, 7 Dec 2020 13:02:46 +0100
From: David Hildenbrand <david@...hat.com>
To: yulei.kernel@...il.com, linux-mm@...ck.org,
akpm@...ux-foundation.org, linux-fsdevel@...r.kernel.org,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
naoya.horiguchi@....com, viro@...iv.linux.org.uk,
pbonzini@...hat.com, Dan Williams <dan.j.williams@...el.com>
Cc: joao.m.martins@...cle.com, rdunlap@...radead.org,
sean.j.christopherson@...el.com, xiaoguangrong.eric@...il.com,
kernellwp@...il.com, lihaiwei.kernel@...il.com,
Yulei Zhang <yuleixzhang@...cent.com>
Subject: Re: [RFC V2 00/37] Enhance memory utilization with DMEMFS
On 07.12.20 12:30, yulei.kernel@...il.com wrote:
> From: Yulei Zhang <yuleixzhang@...cent.com>
>
> In current system each physical memory page is assocaited with
> a page structure which is used to track the usage of this page.
> But due to the memory usage rapidly growing in cloud environment,
> we find the resource consuming for page structure storage becomes
> more and more remarkable. So is it possible that we could reclaim
> such memory and make it reusable?
>
> This patchset introduces an idea about how to save the extra
> memory through a new virtual filesystem -- dmemfs.
>
> Dmemfs (Direct Memory filesystem) is device memory or reserved
> memory based filesystem. This kind of memory is special as it
> is not managed by kernel and most important it is without 'struct page'.
> Therefore we can leverage the extra memory from the host system
> to support more tenants in our cloud service.
"is not managed by kernel" well, it's obviously is managed by the
kernel. It's not managed by the buddy ;)
How is this different to using "mem=X" and mapping the relevant memory
directly into applications? Is this "simply" a control instance on top
that makes sure unprivileged process can access it and not step onto
each others feet? Is that the reason why it's called a "file system"?
(an example would have helped here, showing how it's used)
It's worth noting that memory hotunplug, memory poisoning and probably
more is currently fundamentally incompatible with this approach - which
should better be pointed out in the cover letter.
Also, I think something similar can be obtained by using dax/hmat
infrastructure with "memmap=", at least I remember a talk where this was
discussed (but not sure if they modified the firmware to expose selected
memory as soft-reserved - we would only need a cmdline parameter to
achieve the same - Dan might know more).
>
> As the belowing figure shows, we uses a kernel boot parameter 'dmem='
> to reserve the system memory when the host system boots up, the
> remaining system memory is still managed by system memory management
> which is associated with "struct page", the reserved memory
> will be managed by dmem and assigned to guest system, the details
> can be checked in /Documentation/admin-guide/kernel-parameters.txt.
>
> +------------------+--------------------------------------+
> | system memory | memory for guest system |
> +------------------+--------------------------------------+
> | |
> v |
> struct page |
> | |
> v v
> system mem management dmem
>
> And during the usage, the dmemfs will handle the memory request to
> allocate and free the reserved memory on each NUMA node, the user
> space application could leverage the mmap interface to access the
> memory, and kernel module such as kvm and vfio would be able to pin
> the memory thongh follow_pfn() and get_user_page() in different given
> page size granularities.
I cannot say that I really like this approach. I really prefer the
proposal to free-up most vmemmap pages for huge/gigantic pages instead
if all this is about is reducing the memmap size.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists