lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210728083643.5873-1-sjpark@amazon.de>
Date:   Wed, 28 Jul 2021 08:36:43 +0000
From:   SeongJae Park <sj38.park@...il.com>
To:     Shakeel Butt <shakeelb@...gle.com>
Cc:     SeongJae Park <sj38.park@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        SeongJae Park <sjpark@...zon.de>, Jonathan.Cameron@...wei.com,
        amit@...nel.org, Jonathan Corbet <corbet@....net>,
        David Hildenbrand <david@...hat.com>, dwmw@...zon.com,
        foersleo@...zon.de, Greg Thelen <gthelen@...gle.com>,
        jgowans@...zon.com, mheyne@...zon.de,
        David Rientjes <rientjes@...gle.com>, sieberf@...zon.com,
        Vlastimil Babka <vbabka@...e.cz>, linux-damon@...zon.com,
        Linux MM <linux-mm@...ck.org>, linux-doc@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>, Wei Xu <weixugc@...gle.com>,
        Paul Turner <pjt@...gle.com>, Yu Zhao <yuzhao@...gle.com>,
        Dave Hansen <dave.hansen@...el.com>
Subject: Re: [PATCH v34 00/13] Introduce Data Access MONitor (DAMON)

From: SeongJae Park <sjpark@...zon.de>

Hello,


On Tue, 27 Jul 2021 14:30:38 -0700 Shakeel Butt <shakeelb@...gle.com> wrote:

> (reduced CC list)
> 
> Hi all,
> 
> I have been asked to comment if Google is interested in using this
> feature, its general usefulness and if it is sufficiently general and
> non-duplicative. I will try to answer these but first I will explain
> the use-cases we are particularly interested in and for which we want
> a general access monitoring mechanism.

Thank you for your great opinion below, Shakeel.

> 
> At the moment Google is particularly interested in four use-cases:
> 
> 1) Working set estimation: This is used for cluster level scheduling
> and controlling the knobs of memory overcommit.
> 
> 2) Proactive reclaim
> 
> 3) Balancing between memory tiers: Moving hot pages to fast tiers and
> cold pages to slow tiers
> 
> 4) Hugepage optimization: Hot memory backed by hugepages
> 
> In addition, these uses are not happening in isolation. We want a
> combination of these running concurrently on a system. So, it is clear
> that the first version or step of DAMON which only targets virtual
> address space monitoring is not sufficient for these use-cases.
> 
> I think the more important question is if DAMON can be extended to
> system level monitoring to fulfill these use-cases.

I also think this is the important point.  The main purpose of DAMON patchset
is providing a flexible monitoring framework which can easily extended for many
use cases.  Once we have the framework, I believe people will be able to extend
it for their usages, and others will be able to reuse those (a snowball can
start rolling).

> Address space monitoring is a core concept in DAMON and it has implemented
> address space based optimizations (i.e. dividing address space into regions,
> assuming locality within regions, random sampling within regions instead of
> looking at each page and dynamically adjusting regions).  There is a followup
> proposal on monitoring physical address space in DAMON. However for systems
> running multiple workloads, the address space optimizations core to DAMON
> would be ineffective.

Right.  If the system is running a huge number of different workloads (e.g.,
systems running a huge number of virtual machines or kubernetes-managed
containers), DAMON's region-based monitoring's accuracy could be lowered.

However, I'd like to note that there are many people running only a small
number of major workloads on their systems.  Even in the above case of systems
running a huge number of virtual machines, each virtual machine would have only
a small number of major workloads.  People can use DAMON inside the guests.  We
also confirmed the region-based physical address space monitoring on such
production systems achieves high accuracy (we found 4KB hottest memory region
in 70GB memory).

Also, the region-based monitoring is not mandatory.  The followup proposal
which extends DAMON for physical address space monitoring[1] allows people opt
out it if they want.  In addition to that, it implements a page-granularity
monitoring.  I unsure if the implementation fits for Google's usage, but I sure
you can at least implement your own on the framework without the limitation of
the regions abstraction.

> 
> There are discussions/brainstorming on supporting abstract address
> space based on LRUs which is somewhat similar to Multigen LRU [1]
> proposal but not well articulated yet. BTW Multigen LRU [1] is another
> similar proposal but targets one specific use-case i.e. memory reclaim
> (proactive reclaim). Anyways I think we need more brainstorming for a
> generalized solution of system level access monitoring.

The idea is using the positional index of each page in its LRU list as its
address.  For example, a page at the head of a LRU list will have address 0.
On the address space, we can safely assume the pages adjacent in the address
scheme will have similar access frequency, and therefore DAMON's region-based
monitoring would work.  Further, we can proactively move the pages in the LRU
list so that pages near the head of the list have higher frequencies, based on
the monitoring results.

For example, if we see below monitoring results from the address space:

    <HEAD of a LRU list> HHHHHHHMMMMMMMCCCCHHCCCCC <TAIL of a LRU list>
    (H: Hot page, M: Mid-temperature page, C: Cold page)

We can move the hot pages near the tail to the head, as below:

    <HEAD of a LRU list> HHHHHHHHHMMMMMMMCCCCCCCCC <TAIL of a LRU list>
    (H: Hot page, M: Mid-temperature page, C: Cold page)

This will improve not only monitoring accuracy but also other mechanisms such
as reclamation, which are based on the assumption of LRU list.

As Shakeel also told, this is only in a brainstorming stage, though.

> 
> Regarding merging DAMON, I personally think there are users who might
> be interested in only their virtual address space and DAMON is
> providing a solution for such users. SeongJae can provide more details
> or knowledge if any big user other than Amazon is interested in the
> feature.

AFAIR, Huawei, Intel, and Alibaba shown some level of their interests publicly
and/or personally, so far.  They did code review and/or tests and bug reports.
Also a number of researchers and individuals have reached out to me.

> DAMON does not expose stable APIs at the moment, so these can
> be changed later if needed. I think it is ok to merge DAMON for some
> exposure. However I do want to make this clear that the solution space
> is not complete. The solution of system level monitoring is still
> needed which can be a future extension to DAMON or more generalized
> Multigen LRU.

Agreed.  We have lots more works to do.  Some of those are already posted as
RFC patchsets[1,2,3,4].  I promise I will happily do the works.  But, how dare
could only I get all the fun?  I'd like to do that together with others in this
great community.  One major purpose of this patchset is thus providing a
flexible framework for such collaboration.  The virtual address space
monitoring, which this patchset provides in addition to the framework, is also
for real-world usages, though.

Now all the patches have at least one 'Reviewed-by:' or 'Acked-by:' tags.  We
didn't find serious problems since v26[5], which was posted about four months
ago. so I'm thinking this patchset has passed the minimum qualification.  If
you think there are more things to be done before this patchset is merged in
the -mm tree or mainline, please let me know.  If not, Andrew, I'd like you to
consider merging this patchset into '-mm' tree.


Thanks,
SeongJae Park

> 
> thanks,
> Shakeel
> 
> [1] https://lore.kernel.org/lkml/20210520065355.2736558-1-yuzhao@google.com/

[1] https://lore.kernel.org/linux-mm/20201216094221.11898-1-sjpark@amazon.com/
[2] https://lore.kernel.org/linux-mm/20201216084404.23183-1-sjpark@amazon.com/
[3] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjpark@amazon.com/
[4] https://lore.kernel.org/linux-mm/20210720131309.22073-1-sj38.park@gmail.com/
[5] https://lore.kernel.org/linux-mm/20210330090537.12143-1-sj38.park@gmail.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ