[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 5 May 2021 13:06:09 +0800
From: Abel Wu <wuyun.abel@...edance.com>
To: Tejun Heo <tj@...nel.org>, hannes@...xchg.org
Cc: akpm@...ux-foundation.org, lizefan.x@...edance.com, corbet@....net,
cgroups@...r.kernel.org, linux-mm@...ck.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [Phishing Risk] [External] Re: [PATCH 2/3] cgroup/cpuset:
introduce cpuset.mems.migration
ping :)
On 4/27/21 10:43 PM, Tejun Heo wrote:
> Hello,
>
> On Mon, Apr 26, 2021 at 02:59:45PM +0800, Abel Wu wrote:
>> When a NUMA node is assigned to numa-service, the workload
>> on that node needs to be moved away fast and complete. The
>> main aspects we cared about on the eviction are as follows:
>>
>> a) it should complete soon enough so that numa-services
>> won’t wait too long to hurt user experience
>> b) the workloads to be evicted could have massive usage on
>> memory, and migrating such amount of memory may lead to
>> a sudden severe performance drop lasting tens of seconds
>> that some certain workloads may not afford
>> c) the impact of the eviction should be limited within the
>> source and destination nodes
>> d) cgroup interface is preferred
>>
>> So we come to a thought that:
>>
>> 1) fire up numa-services without waiting for memory migration
>> 2) memory migration can be done asynchronously by using spare
>> memory bandwidth
>>
>> AutoNUMA seems to be a solution, but its scope is global which
>> violates c&d. And cpuset.memory_migrate performs in a synchronous
>
> I don't think d) in itself is a valid requirement. How does it violate c)?
>
>> fashion which breaks a&b. So a mixture of them, the new cgroup2
>> interface cpuset.mems.migration, is introduced.
>>
>> The new cpuset.mems.migration supports three modes:
>>
>> - "none" mode, meaning migration disabled
>> - "sync" mode, which is exactly the same as the cgroup v1
>> interface cpuset.memory_migrate
>> - "lazy" mode, when walking through all the pages, unlike
>> cpuset.memory_migrate, it only sets pages to protnone,
>> and numa faults triggered by later touch will handle the
>> movement.
>
> cpuset is already involved in NUMA allocation but it always felt like
> something bolted on - it's weird to have cpu to NUMA node settings at global
> level and then to have possibly conflicting direct NUMA configuration via
> cpuset. My preference would be putting as much configuration as possible on
> the mm / autonuma side and let cpuset's node confinements further restrict
> their operations rather than cpuset having its own set of policy
> configurations.
>
> Johannes, what are your thoughts?
>
> Thanks.
>
Powered by blists - more mailing lists