linux-kernel - Re: [PATCH V3 0/4] Define coherent device memory node

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKTCnzm+nyAaNMmPgKQVNtxdbTLoarnGtebT=MfsZaSq8Umw0w@mail.gmail.com>
Date:   Wed, 1 Mar 2017 21:59:21 +1100
From:   Balbir Singh <bsingharora@...il.com>
To:     Mel Gorman <mgorman@...e.de>
Cc:     Anshuman Khandual <khandual@...ux.vnet.ibm.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>, Michal Hocko <mhocko@...e.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Minchan Kim <minchan@...nel.org>,
        Aneesh Kumar KV <aneesh.kumar@...ux.vnet.ibm.com>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
        haren@...ux.vnet.ibm.com,
        Jérôme Glisse <jglisse@...hat.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Dan Williams <dan.j.williams@...el.com>
Subject: Re: [PATCH V3 0/4] Define coherent device memory node

On Wed, Mar 1, 2017 at 8:55 PM, Mel Gorman <mgorman@...e.de> wrote:
> On Wed, Mar 01, 2017 at 01:42:40PM +1100, Balbir Singh wrote:
>> >>>The idea of this patchset was to introduce
>> >>>the concept of memory that is not necessarily system memory, but is coherent
>> >>>in terms of visibility/access with some restrictions
>> >>>
>> >>
>> >>Which should be done without special casing the page allocator, cpusets and
>> >>special casing how cpusets are handled. It's not necessary for any other
>> >>mechanism used to restrict access to portions of memory such as cpusets,
>> >>mempolicies or even memblock reservations.
>> >
>> >Agreed, I mentioned a limitation that we see a cpusets. I do agree that
>> >we should reuse any infrastructure we have, but cpusets are more static
>> >in nature and inheritence compared to the requirements of CDM.
>> >
>>
>> Mel, I went back and looked at cpusets and found some limitations that
>> I mentioned earlier, isolating a particular node requires some amount
>> of laborious work in terms of isolating all tasks away from the root cpuset
>> and then creating a hierarchy where the root cpuset is empty and now
>> belong to a child cpuset that has everything but the node we intend to
>> ioslate. Even with hardwalling, it does not prevent allocations from
>> the parent cpuset.
>>
>
> That it is difficult does not in itself justify adding a third mechanism
> specific to one type of device for controlling access to memory.
>

Not only is it difficult, but there are several tasks that refuse to
change cpusets once created. I also noticed that the isolation may
begin a little too late, some allocations may end up on the node to
isolate.

I also want to eventually control whether auto-numa
balancing/kswapd/reclaim etc run on this node (something that cpusets
do not provide). The reason for these decisions is very dependent on
the properties of the node. The isolation mechanism that exists today
is insufficient. Moreover the correct abstraction for device memory
would be a class similar to N_MEMORY, but limited in what we include
(which is why I was asking if questions 3 and 4 are clear). You might
argue these are not NUMA nodes then, but these are in general sense
NUMA nodes (with non-uniform properties and access times). NUMA allows
with the right hardware expose the right programming model. Please
consider reading the full details at

https://patchwork.kernel.org/patch/9566393/
https://lkml.org/lkml/2016/11/22/339

cpusets are designed for slabs/kernel allocation and user mem
policies, we need more control for things like the ones I mentioned
above. We would definitely work with the existing framework so that we
don't duplicate code or add more complexity

>> I am trying to understand the concerns that you/Michal/Vlastimil have
>> so that Anshuman/I/other stake holders can respond to the concerns
>> in one place if that makes sense. Here are the concerns I have heard
>> so far
>>
>> 1. Lets not add any overhead to the page allocator path
>
> Yes and that includes both runtime overhead and maintenance overhead.
> Littering the allocator paths with special casing with runtime overhead
> masked by static branches would still be a maintenance burden given that
> most people will not have the hardware necessary to avoid regressions.
>

I understand that, we'll try and keep the allocator changes to 0 and
where possible reuse cpusets, after all my experiments cpusets comes
out short on some counts, cpusets uses nodes, but does not represent
node characteristics and work the way N_MEMORY works

>> 2. Lets try and keep the allocator changes easy to read/parse
>
> No, simply do not add a new mechanism for controllin access to memory
> when cpusets and memory policies already exist.
>

Same as above

>> 3. Why do we need a NUMA interface?
>> 4. How does this compare with HMM?
>
> Others discussed this topic in detail.
>
>> 5. Why can't we use cpusets?
>>
>
> That is your assertion. The concerns you have are that the work is
> laborious and that designing the administrative interfaces may be
> difficult. In itself that does not justify adding a third mechanism for
> controlling memory acecss.

Laborious and insufficient as stated above unless we make changes to
the way cpusets work.

Thanks for the feedback,
Balbir Singh.