[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50a5ae9e-6dbd-51b6-a374-1b0e45588abf@linux.alibaba.com>
Date: Fri, 12 Jul 2019 12:03:23 +0800
From: 王贇 <yun.wang@...ux.alibaba.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: hannes@...xchg.org, mhocko@...nel.org, vdavydov.dev@...il.com,
Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, mcgrof@...nel.org, keescook@...omium.org,
linux-fsdevel@...r.kernel.org, cgroups@...r.kernel.org,
Mel Gorman <mgorman@...e.de>, riel@...riel.com
Subject: Re: [PATCH 3/4] numa: introduce numa group per task group
On 2019/7/11 下午10:10, Peter Zijlstra wrote:
> On Wed, Jul 03, 2019 at 11:32:32AM +0800, 王贇 wrote:
>> By tracing numa page faults, we recognize tasks sharing the same page,
>> and try pack them together into a single numa group.
>>
>> However when two task share lot's of cache pages while not much
>> anonymous pages, since numa balancing do not tracing cache page, they
>> have no chance to join into the same group.
>>
>> While tracing cache page cost too much, we could use some hints from
>
> I forgot; where again do we skip shared pages? task_numa_work() doesn't
> seem to skip file vmas.
That's the page cache generated by file read/write, rather than the pages
for file mapping, pages of memory to support IO also won't be considered as
shared between tasks since they don't belong to any particular task, but may
serving multiples.
>
>> userland and cpu cgroup could be a good one.
>>
>> This patch introduced new entry 'numa_group' for cpu cgroup, by echo
>> non-zero into the entry, we can now force all the tasks of this cgroup
>> to join the same numa group serving for task group.
>>
>> In this way tasks are more likely to settle down on the same node, to
>> share closer cpu cache and gain benefit from NUMA on both file/anonymous
>> pages.
>>
>> Besides, when multiple cgroup enabled numa group, they will be able to
>> exchange task location by utilizing numa migration, in this way they
>> could achieve single node settle down without breaking load balance.
>
> I dislike cgroup only interfaces; it there really nothing else we could
> use for this?
Me too... while at this moment that's the best approach we have got, we also
tried to use separately module to handle these automatically, but this need
a very good understanding of the system, configuration and workloads which
only known by the owner.
So maybe just providing the functionality and leave the choice to user is not
that bad?
Regards,
Michael Wang
>
Powered by blists - more mailing lists