linux-kernel - Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20211230080335.24320-1-sj@kernel.org>
Date:   Thu, 30 Dec 2021 08:03:35 +0000
From:   SeongJae Park <sj@...nel.org>
To:     "Huang, Ying" <ying.huang@...el.com>
Cc:     SeongJae Park <sj@...nel.org>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        akpm@...ux-foundation.org, dave.hansen@...ux.intel.com,
        ziy@...dia.com, shy828301@...il.com,
        zhongjiang-ali@...ux.alibaba.com, xlpang@...ux.alibaba.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Minchan Kim <minchan@...nel.org>
Subject: Re: [PATCH v2 0/2] Add a new scheme to support demotion on tiered memory system

On Thu, 30 Dec 2021 11:16:15 +0800 "Huang, Ying" <ying.huang@...el.com> wrote:

> SeongJae Park <sj@...nel.org> writes:
> 
> > On Wed, 29 Dec 2021 09:33:56 +0800 "Huang, Ying" <ying.huang@...el.com> wrote:
> >
> >> SeongJae Park <sj@...nel.org> writes:
> >> 
> >> > Hello,
> >> >
> >> > On Mon, 27 Dec 2021 11:09:56 +0800 "Huang, Ying" <ying.huang@...el.com> wrote:
> >> >
> >> >> Hi, SeongJae,
> >> >> 
> >> >> SeongJae Park <sj@...nel.org> writes:
> >> >> 
> >> >> > Hi,
> >> >> >
> >> >> > On Thu, 23 Dec 2021 15:51:18 +0800 "Huang, Ying" <ying.huang@...el.com> wrote:
> >> >> 
> >> >> [snip]
> >> >> 
> >> >> >> It's good to avoid to change the source code of an application to apply
> >> >> >> some memory management optimization (for example, use DAMON +
> >> >> >> madvise()).  But it's much easier to run a user space daemon to optimize
> >> >> >> for the application.  (for example, use DAMON + other information +
> >> >> >> process_madvise()).
> >> >> >> 
> >> >> >> And this kind of per-application optimization is kind of application
> >> >> >> specific policy.  This kind of policy may be too complex and flexible to
> >> >> >> be put in the kernel directly.  For example, in addition to DAMON, some
> >> >> >> other application specific or system knowledge may be helpful too, so we
> >> >> >> have process_madvise() for that before DAMON.  Some more complex
> >> >> >> algorithm may be needed for some applications.
> >> >> >> 
> >> >> >> And this kind of application specific policy usually need complex
> >> >> >> configuration.  It's hard to export all these policy parameters to the
> >> >> >> user space as the kernel ABI.  Now, DAMON schemes parameters are
> >> >> >> exported in debugfs so they are not considered ABI.  So they may be
> >> >> >> changed at any time.  But applications need some stable and
> >> >> >> well-maintained ABI.
> >> >> >> 
> >> >> >> All in all, IMHO, what we need is a user space per-application policy
> >> >> >> daemon with the information from DAMON and other sources.
> >> >> >
> >> >> > I basically agree to Ying, as I also noted in the coverletter of DAMOS
> >> >> > patchset[1]:
> >> >> >
> >> >> >     DAMON[1] can be used as a primitive for data access aware memory
> >> >> >     management optimizations.  For that, users who want such optimizations
> >> >> >     should run DAMON, read the monitoring results, analyze it, plan a new
> >> >> >     memory management scheme, and apply the new scheme by themselves.  Such
> >> >> >     efforts will be inevitable for some complicated optimizations.
> >> >> >
> >> >> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fda504fade7f124858d7022341dc46ff35b45274
> >> >> >
> >> >> > That is, I believe some programs and big companies would definitely have their
> >> >> > own information and want such kind of complicated optimizations.  But, such
> >> >> > optimizations would depend on characteristics of each program and require
> >> >> > investment of some amount of resources.  Some other programs and users wouldn't
> >> >> > have such special information, and/or resource to invest for such
> >> >> > optimizations.  For them, some amount of benefit would be helpful enough even
> >> >> > though its sub-optimal.
> >> >> >
> >> >> > I think we should help both groups, and DAMOS could be useful for the second
> >> >> > group.  And I don't think DAMOS is useless for the first group.  They could use
> >> >> > their information-based policy in prallel to DAMOS in some cases.  E.g., if
> >> >> > they have a way to predict the data access pattern of specific memory region
> >> >> > even without help from DAMON, they can use their own policy for the region but
> >> >> > DAMOS for other regions.
> >> >> >
> >> >> > Someone could ask why not implement a user-space implementation for the second
> >> >> > group, then.  First of all, DAMOS is not only for the user-space driven virtual
> >> >> > memory management optimization, but also for kernel-space programs and any
> >> >> > DAMOS-supportable address spaces including the physical address space.  And,
> >> >> > another important goal of DAMOS for user space driven use case in addition to
> >> >> > reducing the redundant code is minimizing the user-kernel context switch
> >> >> > overhead for passing the monitoring results information and memory management
> >> >> > action requests.
> >> >> >
> >> >> > In summary, I agree the user space per-application policy daemon will be useful
> >> >> > for the specialized ultimate optimizations, but we also need DAMOS for another
> >> >> > common group of cases.
> >> >> >
> >> >> > If I'm missing something, please feel free to let me know.
> >> >> 
> >> >> I guess that most end-users and quite some system administrators of
> >> >> small companies have no enough capability to take advantage of the
> >> >> per-application optimizations.  How do they know the appropriate region
> >> >> number and proactive reclaim threshold?
> >> >> 
> >> >> So per my understanding, Linux kernel
> >> >> need provide,
> >> >> 
> >> >> 1. An in-kernel general policy that is obviously correct and benefits
> >> >>    almost all users and applications, at least no regression.  No
> >> >>    complex configuration or deep knowledge is needed to take advantage
> >> >>    of it.
> >> >> 
> >> >> 2. Some way to inspect and control system and application behavior, so
> >> >>    that some advanced and customized user space policy daemons can be
> >> >>    built to satisfy some advanced users who have the enough knowledge
> >> >>    for the applications and systems, for example, oomd.
> >> >
> >> > Agreed, and I think that's the approach that DAMON is currently taking.  In
> >> > specific, we provide DAMON debugfs interface for users who want to inspect and
> >> > control their system and application behavior.  Using it, we also made a PoC
> >> > level user space policy daemon[1].
> >> >
> >> > For the in-kernel policies, we are developing DAMON-based kernel components one
> >> > by one, for specific usages.  DAMON-based proactive reclamation module
> >> > (DAMON_RECLAIM) is one such example.  Such DAMON-based components will remove
> >> > complex tunables that necessary for the general inspection and control of the
> >> > system but unnecessary for their specific purpose (e.g., proactive reclamation)
> >> > to allow users use it in a simple manner.  Also, those will use conservative
> >> > default configs to not incur visible regression.  For example, DAMON_RECLAIM
> >> > uses only up to 1% of single CPU time for the reclamation by default.
> >> 
> >> I don't think DAMON schemes are the in-kernel general policy I mentioned
> >> above (1.).  For example, NUMA balancing is a general policy to optimize
> >> performance.  It tries to provide a general policy that works for all
> >> users with as few as possible tunables.  If some tunables are needed,
> >> they will be provided as ABI.
> >
> > Exactly.  What I'm saying is, DAMON schemes that exposed to user space via the
> > debugfs interface is for inspection of system and development of user space
> > daemon (2.).  It requires some level of tuning and doesn't provide stable ABI
> > but the debugfs interface.  Meanwhile, DAMON-based kernel components like
> > DAMON_RECLAIM can be used for the in-kernel general policy (1.).  For example,
> > DAMON_RECLAIM also tries to be beneficial or at least incur no regression for
> > almost every users, provides as few as possible tunables, and provides those
> > via its ABI (module parameters), not debugfs.
> 
> Thanks for your detailed explanation.
> 
> Per my understanding, DAMON schemes are kind of building blocks of some
> kernel feature such as DAMON_RECLAIM.

I pretty sure you're perfectly understanding my point.

> Whether do we need a new scheme depends on whether it's useful as part of
> some kernel feature.  Do you agree?

Yes, agreed.


Thanks,
SJ

> 
> Best Regards,
> Huang, Ying
> 
> > Thanks,
> > SJ
> >
> >> 
> >> Best Regards,
> >> Huang, Ying
> >> 
> >> > In short, I think we're on the same page, and adding DEMOTION scheme action
> >> > could be helpful for the users who want to efficiently inspect and control the
> >> > system/application behavior for their tiered memory systems.  It's unclear how
> >> > much benefit this could give to users, though.  I assume Baolin would come back
> >> > with some sort of numbers in the next spin.  Nevertheless, I personally don't
> >> > think that's a critical blocker, as this patch is essentially just adding a way
> >> > for using the pre-existing primitive, namely move_pages(), in a little bit more
> >> > efficient manner, for the access pattern-based use cases. 
> >> >
> >> > If I'm missing something, please feel free to let me know.
> >> >
> >> > [1] https://github.com/awslabs/damoos
> >> >
> >> >
> >> > Thanks,
> >> > SJ
> >> >
> >> >> 
> >> >> Best Regards,
> >> >> Huang, Ying
> >>