lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YW1rcv4bN1WWhzLD@dhcp22.suse.cz>
Date:   Mon, 18 Oct 2021 14:41:22 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Zhaoyang Huang <huangzhaoyang@...il.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Zhaoyang Huang <zhaoyang.huang@...soc.com>,
        "open list:MEMORY MANAGEMENT" <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm: skip current when memcg reclaim

On Mon 18-10-21 17:25:23, Zhaoyang Huang wrote:
> On Mon, Oct 18, 2021 at 4:23 PM Michal Hocko <mhocko@...e.com> wrote:
> >
> > On Fri 15-10-21 14:15:29, Huangzhaoyang wrote:
> > > From: Zhaoyang Huang <zhaoyang.huang@...soc.com>
> > >
> > > Sibling thread of the same process could refault the reclaimed pages
> > > in the same time, which would be typical in None global reclaim and
> > > introduce thrashing.
> >
> > It is hard to understand what kind of problem you see (ideally along
> > with some numbers) and how the proposed patch addresses that problem
> >
> > Also you are missing Signed-off-by tag (please have a look at
> > Documentation/process/submitting-patches.rst which is much more
> > comprehensive about the process).
> sorry for that, I will fix it.
> >
> > > ---
> > >  mm/vmscan.c | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > >
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 5199b96..ebbdc37 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -2841,6 +2841,11 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
> > >                               sc->memcg_low_skipped = 1;
> > >                               continue;
> > >                       }
> > > +                     /*
> > > +                      * Don't bother current when its memcg is below low
> > > +                      */
> > > +                     if (get_mem_cgroup_from_mm(current->mm) == memcg)
> > > +                             continue;
> >
> > This code is executed when none of memcg in the reclaimed hierarchy
> > could be reclaimed. Low limit is then ignored and this change is
> > tweaking that behavior without any description of the effect. A very
> > vague note about trashing would indicate that you have something like
> > the following
> >
> >         A (hiting hard limit)
> >        / \
> >       B   C
> >
> > Both B and C low limit protected and current task associated with B. As
> > none of the two could be reclaimed due to soft protection yuu prefer to
> > reclaim from C as you do not want to reclaim from the current process as
> > that could reclaim current's working set. Correct?
> >
> > I would be really curious about more specifics of the used hierarchy.
> What I am facing is a typical scenario on Android, that is a big
> memory consuming APP(camera etc) launched while background filled by
> other processes. The hierarchy is like what you describe above where B
> represents the APP and memory.low is set to help warm restart. Both of
> kswapd and direct reclaim work together to reclaim pages under this
> scenario, which can cause 20MB file page delete from LRU in several
> second. This change could help to have current process's page escape
> from being reclaimed and cause page thrashing. We observed the result
> via systrace which shows that the Uninterruptible sleep(block on page
> bit) and iowait get smaller than usual.

I still have hard time to understand the exact setup and why the patch
helps you. If you want to protect B more than the low limit would allow
for by stealiong from C then the same thing can happen from anybody
reclaiming from C so in the end there is no protection. The same would
apply for any global direct memory reclaim done by a 3rd party. So I
suspect that your patch just happens to work by a luck.

Why both B and C have low limit setup and they both cannot be reclaimed?
Isn't that a weird setup where A hard limit is too close to sum of low
limits of B and C?

In other words could you share a more detailed configuration you are
using and some more details why both B and C have been skipped during
the first pass of the reclaim?

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ