linux-kernel - Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b9b40585-cb59-3d42-bcf8-e59bff77c663@intel.com>
Date:   Tue, 16 Apr 2019 07:30:20 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Michal Hocko <mhocko@...nel.org>,
        Yang Shi <yang.shi@...ux.alibaba.com>
Cc:     mgorman@...hsingularity.net, riel@...riel.com, hannes@...xchg.org,
        akpm@...ux-foundation.org, keith.busch@...el.com,
        dan.j.williams@...el.com, fengguang.wu@...el.com, fan.du@...el.com,
        ying.huang@...el.com, ziy@...dia.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

On 4/16/19 12:47 AM, Michal Hocko wrote:
> You definitely have to follow policy. You cannot demote to a node which
> is outside of the cpuset/mempolicy because you are breaking contract
> expected by the userspace. That implies doing a rmap walk.

What *is* the contract with userspace, anyway? :)

Obviously, the preferred policy doesn't have any strict contract.

The strict binding has a bit more of a contract, but it doesn't prevent
swapping.  Strict binding also doesn't keep another app from moving the
memory.

We have a reasonable argument that demotion is better than swapping.
So, we could say that even if a VMA has a strict NUMA policy, demoting
pages mapped there pages still beats swapping them or tossing the page
cache.  It's doing them a favor to demote them.

Or, maybe we just need a swap hybrid where demotion moves the page but
keeps it unmapped and in the swap cache.  That way an access gets a
fault and we can promote the page back to where it should be.  That
would be faster than I/O-based swap for sure.

Anyway, I agree that the kernel probably shouldn't be moving pages
around willy-nilly with no consideration for memory policies, but users
might give us some wiggle room too.