linux-kernel - Re: [PATCH 0/5] Page demotion for memory reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2137A80F-CC90-411B-A1AF-A56384ADE0B8@nvidia.com>
Date:   Thu, 21 Mar 2019 17:20:54 -0700
From:   Zi Yan <ziy@...dia.com>
To:     Yang Shi <shy828301@...il.com>
CC:     Keith Busch <keith.busch@...el.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux MM <linux-mm@...ck.org>, <linux-nvdimm@...ts.01.org>,
        Dave Hansen <dave.hansen@...el.com>,
        Dan Williams <dan.j.williams@...el.com>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        John Hubbard <jhubbard@...dia.com>,
        Michal Hocko <mhocko@...e.com>,
        David Nellans <dnellans@...dia.com>
Subject: Re: [PATCH 0/5] Page demotion for memory reclaim

On 21 Mar 2019, at 16:02, Yang Shi wrote:

> On Thu, Mar 21, 2019 at 3:36 PM Keith Busch <keith.busch@...el.com> wrote:
>>
>> On Thu, Mar 21, 2019 at 02:20:51PM -0700, Zi Yan wrote:
>>> 1. The name of “page demotion” seems confusing to me, since I thought it was about large pages
>>> demote to small pages as opposite to promoting small pages to THPs. Am I the only
>>> one here?
>>
>> If you have a THP, we'll skip the page migration and fall through to
>> split_huge_page_to_list(), then the smaller pages can be considered,
>> migrated and reclaimed individually. Not that we couldn't try to migrate
>> a THP directly. It was just simpler implementation for this first attempt.
>>
>>> 2. For the demotion path, a common case would be from high-performance memory, like HBM
>>> or Multi-Channel DRAM, to DRAM, then to PMEM, and finally to disks, right? More general
>>> case for demotion path would be derived from the memory performance description from HMAT[1],
>>> right? Do you have any algorithm to form such a path from HMAT?
>>
>> Yes, I have a PoC for the kernel setting up a demotion path based on
>> HMAT properties here:
>>
>>   https://git.kernel.org/pub/scm/linux/kernel/git/kbusch/linux.git/commit/?h=mm-migrate&id=4d007659e1dd1b0dad49514348be4441fbe7cadb
>>
>> The above is just from an experimental branch.
>>
>>> 3. Do you have a plan for promoting pages from lower-level memory to higher-level memory,
>>> like from PMEM to DRAM? Will this one-way demotion make all pages sink to PMEM and disk?
>>
>> Promoting previously demoted pages would require the application do
>> something to make that happen if you turn demotion on with this series.
>> Kernel auto-promotion is still being investigated, and it's a little
>> trickier than reclaim.
>
> Just FYI. I'm currently working on a patchset which tries to promotes
> page from second tier memory (i.e. PMEM) to DRAM via NUMA balancing.
> But, NUMA balancing can't deal with unmapped page cache, they have to
> be promoted from different path, i.e. mark_page_accessed().

Got it. Another concern is that NUMA balancing marks pages inaccessible
to obtain access information. It might add more overheads on top of page migration
overheads. Considering the benefit of migrating pages from PMEM to DRAM
is not as large as “bring data from disk to DRAM”, the overheads might offset
the benefit, meaning you might see performance degradation.

>
> And, I do agree with Keith, promotion is definitely trickier than
> reclaim since kernel can't recognize "hot" pages accurately. NUMA
> balancing is still corse-grained and inaccurate, but it is simple. If
> we would like to implement more sophisticated algorithm, in-kernel
> implementation might be not a good idea.

I agree. Or hardware vendor, like Intel, could bring more information
on page hotness, like multi-bit access bits or page-modification log
provided by Intel for virtualization.



--
Best Regards,
Yan Zi

Download attachment "signature.asc" of type "application/pgp-signature" (855 bytes)