lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkZSgRs6T60Gv4dZR5xBemxgCB_2s8hz8zB0F_nakN5aTQ@mail.gmail.com>
Date:   Wed, 18 May 2022 15:46:24 -0700
From:   Yosry Ahmed <yosryahmed@...gle.com>
To:     Vaibhav Jain <vaibhav@...ux.ibm.com>
Cc:     cgroups@...r.kernel.org, linux-doc@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>, Tejun Heo <tj@...nel.org>,
        Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Jonathan Corbet <corbet@....net>,
        Michal Hocko <mhocko@...nel.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Aneesh Kumar K . V" <aneesh.kumar@...ux.ibm.com>,
        Shakeel Butt <shakeelb@...gle.com>
Subject: Re: [PATCH] memcg: provide reclaim stats via 'memory.reclaim'

On Wed, May 18, 2022 at 3:38 PM Vaibhav Jain <vaibhav@...ux.ibm.com> wrote:
>
> [1] Provides a way for user-space to trigger proactive reclaim by introducing
> a write-only memcg file 'memory.reclaim'. However reclaim stats like number
> of pages scanned and reclaimed is still not directly available to the
> user-space.
>
> This patch proposes to extend [1] to make the memcg file 'memory.reclaim'
> readable which returns the number of pages scanned / reclaimed during the
> reclaim process from 'struct vmpressure' associated with each memcg. This should
> let user-space asses how successful proactive reclaim triggered from memcg
> 'memory.reclaim' was ?

Isn't this a racy read? struct vmpressure can be changed between the
write and read by other reclaim operations, right?

I was actually planning to send a patch that does not updated
vmpressure for user-controller reclaim, similar to how PSI is handled.

The interface currently returns -EBUSY if the entire amount was not
reclaimed, so isn't this enough to figure out if it was successful or
not? If not, we can store the scanned / reclaim counts of the last
memory.reclaim invocation for the sole purpose of memory.reclaim
reads. Maybe it is actually more intuitive to users to just read the
amount of memory read? In a format that is similar to the one written?

i.e
echo "10M" > memory.reclaim
cat memory.reclaim
9M

>
> With the patch following command flow is expected:
>
>  # echo "1M" > memory.reclaim
>
>  # cat memory.reclaim
>    scanned 76
>    reclaimed 32
>
> [1]:  https://lore.kernel.org/r/20220425190040.2475377-1-yosryahmed@google.com
>
> Cc: Shakeel Butt <shakeelb@...gle.com>
> Cc: Yosry Ahmed <yosryahmed@...gle.com>
> Signed-off-by: Vaibhav Jain <vaibhav@...ux.ibm.com>
> ---
>  Documentation/admin-guide/cgroup-v2.rst | 15 ++++++++++++---
>  mm/memcontrol.c                         | 14 ++++++++++++++
>  2 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 27ebef2485a3..44610165261d 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1209,18 +1209,27 @@ PAGE_SIZE multiple when read back.
>         utility is limited to providing the final safety net.
>
>    memory.reclaim
> -       A write-only nested-keyed file which exists for all cgroups.
> +       A nested-keyed file which exists for all cgroups.
>
> -       This is a simple interface to trigger memory reclaim in the
> -       target cgroup.
> +       This is a simple interface to trigger memory reclaim and retrieve
> +       reclaim stats in the target cgroup.
>
>         This file accepts a single key, the number of bytes to reclaim.
>         No nested keys are currently supported.
>
> +       Reading the file returns number of pages scanned and number of
> +       pages reclaimed from the memcg. This information fetched from
> +       vmpressure info associated with each cgroup.
> +
>         Example::
>
>           echo "1G" > memory.reclaim
>
> +         cat memory.reclaim
> +
> +         scanned 78
> +         reclaimed 30
> +
>         The interface can be later extended with nested keys to
>         configure the reclaim behavior. For example, specify the
>         type of memory to reclaim from (anon, file, ..).
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2e2bfbed4717..9e43580a8726 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6423,6 +6423,19 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of,
>         return nbytes;
>  }
>
> +static int memory_reclaim_show(struct seq_file *m, void *v)
> +{
> +       struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
> +       struct vmpressure *vmpr = memcg_to_vmpressure(memcg);
> +
> +       spin_lock(&vmpr->sr_lock);
> +       seq_printf(m, "scanned %lu\nreclaimed %lu\n",
> +                  vmpr->scanned, vmpr->reclaimed);
> +       spin_unlock(&vmpr->sr_lock);
> +
> +       return 0;
> +}
> +
>  static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
>                               size_t nbytes, loff_t off)
>  {
> @@ -6525,6 +6538,7 @@ static struct cftype memory_files[] = {
>                 .name = "reclaim",
>                 .flags = CFTYPE_NS_DELEGATABLE,
>                 .write = memory_reclaim,
> +               .seq_show  = memory_reclaim_show,
>         },
>         { }     /* terminate */
>  };
> --
> 2.35.1
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ