linux-kernel - Re: [PATCH -V8 02/10] mm/numa: automatically generate node migration order

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <7E27A89B-DAFD-43E3-B90D-76E90FEE2EDD@nvidia.com>
Date:   Tue, 22 Jun 2021 08:48:14 -0400
From:   Zi Yan <ziy@...dia.com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     "Huang, Ying" <ying.huang@...el.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Yang Shi <shy828301@...il.com>,
        Michal Hocko <mhocko@...e.com>, Wei Xu <weixugc@...gle.com>,
        David Rientjes <rientjes@...gle.com>,
        Dan Williams <dan.j.williams@...el.com>,
        David Hildenbrand <david@...hat.com>,
        osalvador <osalvador@...e.de>
Subject: Re: [PATCH -V8 02/10] mm/numa: automatically generate node migration order

On 22 Jun 2021, at 8:06, Dave Hansen wrote:

> Yan, your reply came through in HTML.  It doesn't bother me too much,
> but you'll find your replies dropped by LKML and other mailing lists
> if you do this.

Apologies. I used the wrong text mode. Thanks for letting me know.

>
> On 6/21/21 7:50 AM, Zi Yan wrote:
>> Is there a plan of allowing user to change where the migration path
>> starts? Or maybe one step further providing an interface to allow
>> user to specify the demotion path. Something like
>> /sys/devices/system/node/node*/node_demotion.
>
> We actually had this in an earlier series.  I pulled it out because we
> don't really *need* this ABI at the moment.  But, I totally agree that
> it would be handy for many things, including any non-obvious topology
> where the built-in ordering isn't optimal.
>
>> I don't think that's necessary at least for now. Do you know any
>> real world use case for this?
>>
>> In our P9+volta system, GPU memory is exposed as a NUMA node. For
>> the GPU workloads with data size greater than GPU memory size, it
>> will be very helpful to allow pages in GPU memory to be
>> migrated/demoted to CPU memory. With your current assumption, GPU
>> memory -> CPU memory demotion seems not possible, right? This
>> should also apply to any system with a device memory exposed as a
>> NUMA node and workloads running on the device and using CPU memory
>> as a lower tier memory than the device memory.
>
> Yes, with the current ordering, CPU memory would be demoted to the
> GPU, not the other way around.  The right way to fix this (on ACPI
> platforms at least) is probably to use the HMAT table and build the
> demotion based on any memory targets rather than just CPUs.
>
> That would be a great future enhancement to all of this.  But, because
> not all systems have HMATs, we also need something more basic, which
> is what is in this series.

This information is very helpful. I agree that reading HMAT table is
the right way. I will look into it. Thanks!


—
Best Regards,
Yan, Zi

Download attachment "signature.asc" of type "application/pgp-signature" (855 bytes)