[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191107174555.GA116752@cmpxchg.org>
Date: Thu, 7 Nov 2019 09:45:55 -0800
From: Johannes Weiner <hannes@...xchg.org>
To: Shakeel Butt <shakeelb@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Andrey Ryabinin <aryabinin@...tuozzo.com>,
Suren Baghdasaryan <surenb@...gle.com>,
Michal Hocko <mhocko@...e.com>, Linux MM <linux-mm@...ck.org>,
Cgroups <cgroups@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Kernel Team <kernel-team@...com>
Subject: Re: [PATCH 00/11] mm: fix page aging across multiple cgroups
On Wed, Nov 06, 2019 at 06:50:25PM -0800, Shakeel Butt wrote:
> On Mon, Jun 3, 2019 at 2:59 PM Johannes Weiner <hannes@...xchg.org> wrote:
> >
> > When applications are put into unconfigured cgroups for memory
> > accounting purposes, the cgrouping itself should not change the
> > behavior of the page reclaim code. We expect the VM to reclaim the
> > coldest pages in the system. But right now the VM can reclaim hot
> > pages in one cgroup while there is eligible cold cache in others.
> >
> > This is because one part of the reclaim algorithm isn't truly cgroup
> > hierarchy aware: the inactive/active list balancing. That is the part
> > that is supposed to protect hot cache data from one-off streaming IO.
> >
> > The recursive cgroup reclaim scheme will scan and rotate the physical
> > LRU lists of each eligible cgroup at the same rate in a round-robin
> > fashion, thereby establishing a relative order among the pages of all
> > those cgroups. However, the inactive/active balancing decisions are
> > made locally within each cgroup, so when a cgroup is running low on
> > cold pages, its hot pages will get reclaimed - even when sibling
> > cgroups have plenty of cold cache eligible in the same reclaim run.
> >
> > For example:
> >
> > [root@ham ~]# head -n1 /proc/meminfo
> > MemTotal: 1016336 kB
> >
> > [root@ham ~]# ./reclaimtest2.sh
> > Establishing 50M active files in cgroup A...
> > Hot pages cached: 12800/12800 workingset-a
> > Linearly scanning through 18G of file data in cgroup B:
> > real 0m4.269s
> > user 0m0.051s
> > sys 0m4.182s
> > Hot pages cached: 134/12800 workingset-a
> >
>
> Can you share reclaimtest2.sh as well? Maybe a selftest to
> monitor/test future changes.
I wish it were more portable, but it really only does what it says in
the log output, in a pretty hacky way, with all parameters hard-coded
to my test environment:
---
#!/bin/bash
# this should protect workingset-a from workingset-b
set -e
#set -x
echo Establishing 50M active files in cgroup A...
rmdir /cgroup/workingset-a 2>/dev/null || true
mkdir /cgroup/workingset-a
echo $$ > /cgroup/workingset-a/cgroup.procs
rm -f workingset-a
dd of=workingset-a bs=1M count=0 seek=50 2>/dev/null >/dev/null
cat workingset-a > /dev/null
cat workingset-a > /dev/null
cat workingset-a > /dev/null
cat workingset-a > /dev/null
cat workingset-a > /dev/null
cat workingset-a > /dev/null
cat workingset-a > /dev/null
cat workingset-a > /dev/null
echo -n "Hot pages cached: "
./mincore workingset-a
echo -n Linearly scanning through 2G of file data cgroup B:
rmdir /cgroup/workingset-b >/dev/null || true
mkdir /cgroup/workingset-b
echo $$ > /cgroup/workingset-b/cgroup.procs
rm -f workingset-b
dd of=workingset-b bs=1M count=0 seek=2048 2>/dev/null >/dev/null
time (
cat workingset-b > /dev/null
cat workingset-b > /dev/null
cat workingset-b > /dev/null
cat workingset-b > /dev/null
cat workingset-b > /dev/null
cat workingset-b > /dev/null
cat workingset-b > /dev/null
cat workingset-b > /dev/null )
echo -n "Hot pages cached: "
./mincore workingset-a
Powered by blists - more mailing lists