[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87tueq6av4.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date: Thu, 30 Dec 2021 11:16:15 +0800
From: "Huang, Ying" <ying.huang@...el.com>
To: SeongJae Park <sj@...nel.org>
Cc: Baolin Wang <baolin.wang@...ux.alibaba.com>,
akpm@...ux-foundation.org, dave.hansen@...ux.intel.com,
ziy@...dia.com, shy828301@...il.com,
zhongjiang-ali@...ux.alibaba.com, xlpang@...ux.alibaba.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Minchan Kim <minchan@...nel.org>
Subject: Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered
memory system
SeongJae Park <sj@...nel.org> writes:
> On Wed, 29 Dec 2021 09:33:56 +0800 "Huang, Ying" <ying.huang@...el.com> wrote:
>
>> SeongJae Park <sj@...nel.org> writes:
>>
>> > Hello,
>> >
>> > On Mon, 27 Dec 2021 11:09:56 +0800 "Huang, Ying" <ying.huang@...el.com> wrote:
>> >
>> >> Hi, SeongJae,
>> >>
>> >> SeongJae Park <sj@...nel.org> writes:
>> >>
>> >> > Hi,
>> >> >
>> >> > On Thu, 23 Dec 2021 15:51:18 +0800 "Huang, Ying" <ying.huang@...el.com> wrote:
>> >>
>> >> [snip]
>> >>
>> >> >> It's good to avoid to change the source code of an application to apply
>> >> >> some memory management optimization (for example, use DAMON +
>> >> >> madvise()). But it's much easier to run a user space daemon to optimize
>> >> >> for the application. (for example, use DAMON + other information +
>> >> >> process_madvise()).
>> >> >>
>> >> >> And this kind of per-application optimization is kind of application
>> >> >> specific policy. This kind of policy may be too complex and flexible to
>> >> >> be put in the kernel directly. For example, in addition to DAMON, some
>> >> >> other application specific or system knowledge may be helpful too, so we
>> >> >> have process_madvise() for that before DAMON. Some more complex
>> >> >> algorithm may be needed for some applications.
>> >> >>
>> >> >> And this kind of application specific policy usually need complex
>> >> >> configuration. It's hard to export all these policy parameters to the
>> >> >> user space as the kernel ABI. Now, DAMON schemes parameters are
>> >> >> exported in debugfs so they are not considered ABI. So they may be
>> >> >> changed at any time. But applications need some stable and
>> >> >> well-maintained ABI.
>> >> >>
>> >> >> All in all, IMHO, what we need is a user space per-application policy
>> >> >> daemon with the information from DAMON and other sources.
>> >> >
>> >> > I basically agree to Ying, as I also noted in the coverletter of DAMOS
>> >> > patchset[1]:
>> >> >
>> >> > DAMON[1] can be used as a primitive for data access aware memory
>> >> > management optimizations. For that, users who want such optimizations
>> >> > should run DAMON, read the monitoring results, analyze it, plan a new
>> >> > memory management scheme, and apply the new scheme by themselves. Such
>> >> > efforts will be inevitable for some complicated optimizations.
>> >> >
>> >> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fda504fade7f124858d7022341dc46ff35b45274
>> >> >
>> >> > That is, I believe some programs and big companies would definitely have their
>> >> > own information and want such kind of complicated optimizations. But, such
>> >> > optimizations would depend on characteristics of each program and require
>> >> > investment of some amount of resources. Some other programs and users wouldn't
>> >> > have such special information, and/or resource to invest for such
>> >> > optimizations. For them, some amount of benefit would be helpful enough even
>> >> > though its sub-optimal.
>> >> >
>> >> > I think we should help both groups, and DAMOS could be useful for the second
>> >> > group. And I don't think DAMOS is useless for the first group. They could use
>> >> > their information-based policy in prallel to DAMOS in some cases. E.g., if
>> >> > they have a way to predict the data access pattern of specific memory region
>> >> > even without help from DAMON, they can use their own policy for the region but
>> >> > DAMOS for other regions.
>> >> >
>> >> > Someone could ask why not implement a user-space implementation for the second
>> >> > group, then. First of all, DAMOS is not only for the user-space driven virtual
>> >> > memory management optimization, but also for kernel-space programs and any
>> >> > DAMOS-supportable address spaces including the physical address space. And,
>> >> > another important goal of DAMOS for user space driven use case in addition to
>> >> > reducing the redundant code is minimizing the user-kernel context switch
>> >> > overhead for passing the monitoring results information and memory management
>> >> > action requests.
>> >> >
>> >> > In summary, I agree the user space per-application policy daemon will be useful
>> >> > for the specialized ultimate optimizations, but we also need DAMOS for another
>> >> > common group of cases.
>> >> >
>> >> > If I'm missing something, please feel free to let me know.
>> >>
>> >> I guess that most end-users and quite some system administrators of
>> >> small companies have no enough capability to take advantage of the
>> >> per-application optimizations. How do they know the appropriate region
>> >> number and proactive reclaim threshold?
>> >>
>> >> So per my understanding, Linux kernel
>> >> need provide,
>> >>
>> >> 1. An in-kernel general policy that is obviously correct and benefits
>> >> almost all users and applications, at least no regression. No
>> >> complex configuration or deep knowledge is needed to take advantage
>> >> of it.
>> >>
>> >> 2. Some way to inspect and control system and application behavior, so
>> >> that some advanced and customized user space policy daemons can be
>> >> built to satisfy some advanced users who have the enough knowledge
>> >> for the applications and systems, for example, oomd.
>> >
>> > Agreed, and I think that's the approach that DAMON is currently taking. In
>> > specific, we provide DAMON debugfs interface for users who want to inspect and
>> > control their system and application behavior. Using it, we also made a PoC
>> > level user space policy daemon[1].
>> >
>> > For the in-kernel policies, we are developing DAMON-based kernel components one
>> > by one, for specific usages. DAMON-based proactive reclamation module
>> > (DAMON_RECLAIM) is one such example. Such DAMON-based components will remove
>> > complex tunables that necessary for the general inspection and control of the
>> > system but unnecessary for their specific purpose (e.g., proactive reclamation)
>> > to allow users use it in a simple manner. Also, those will use conservative
>> > default configs to not incur visible regression. For example, DAMON_RECLAIM
>> > uses only up to 1% of single CPU time for the reclamation by default.
>>
>> I don't think DAMON schemes are the in-kernel general policy I mentioned
>> above (1.). For example, NUMA balancing is a general policy to optimize
>> performance. It tries to provide a general policy that works for all
>> users with as few as possible tunables. If some tunables are needed,
>> they will be provided as ABI.
>
> Exactly. What I'm saying is, DAMON schemes that exposed to user space via the
> debugfs interface is for inspection of system and development of user space
> daemon (2.). It requires some level of tuning and doesn't provide stable ABI
> but the debugfs interface. Meanwhile, DAMON-based kernel components like
> DAMON_RECLAIM can be used for the in-kernel general policy (1.). For example,
> DAMON_RECLAIM also tries to be beneficial or at least incur no regression for
> almost every users, provides as few as possible tunables, and provides those
> via its ABI (module parameters), not debugfs.
Thanks for your detailed explanation.
Per my understanding, DAMON schemes are kind of building blocks of some
kernel feature such as DAMON_RECLAIM. Whether do we need a new scheme
depends on whether it's useful as part of some kernel feature. Do you
agree?
Best Regards,
Huang, Ying
> Thanks,
> SJ
>
>>
>> Best Regards,
>> Huang, Ying
>>
>> > In short, I think we're on the same page, and adding DEMOTION scheme action
>> > could be helpful for the users who want to efficiently inspect and control the
>> > system/application behavior for their tiered memory systems. It's unclear how
>> > much benefit this could give to users, though. I assume Baolin would come back
>> > with some sort of numbers in the next spin. Nevertheless, I personally don't
>> > think that's a critical blocker, as this patch is essentially just adding a way
>> > for using the pre-existing primitive, namely move_pages(), in a little bit more
>> > efficient manner, for the access pattern-based use cases.
>> >
>> > If I'm missing something, please feel free to let me know.
>> >
>> > [1] https://github.com/awslabs/damoos
>> >
>> >
>> > Thanks,
>> > SJ
>> >
>> >>
>> >> Best Regards,
>> >> Huang, Ying
>>
Powered by blists - more mailing lists