linux-kernel - Re: 答复: 答复: [PATCH] mm/memcontrol.c: speed up to force empty a memory cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180320083950.GD23100@dhcp22.suse.cz>
Date:   Tue, 20 Mar 2018 09:39:50 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     David Rientjes <rientjes@...gle.com>
Cc:     "Li,Rongqing" <lirongqing@...du.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
        "hannes@...xchg.org" <hannes@...xchg.org>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>
Subject: Re: 答复: 答复: [PATCH]
 mm/memcontrol.c: speed up to force empty a memory cgroup

On Mon 19-03-18 10:51:57, David Rientjes wrote:
> On Mon, 19 Mar 2018, Li,Rongqing wrote:
> 
> > > > Although SWAP_CLUSTER_MAX is used at the lower level, but the call
> > > > stack of try_to_free_mem_cgroup_pages is too long, increase the
> > > > nr_to_reclaim can reduce times of calling
> > > > function[do_try_to_free_pages, shrink_zones, hrink_node ]
> > > >
> > > > mem_cgroup_resize_limit
> > > > --->try_to_free_mem_cgroup_pages:  .nr_to_reclaim = max(1024,
> > > > --->SWAP_CLUSTER_MAX),
> > > >    ---> do_try_to_free_pages
> > > >      ---> shrink_zones
> > > >       --->shrink_node
> > > >        ---> shrink_node_memcg
> > > >          ---> shrink_list          <-------loop will happen in this place
> > > [times=1024/32]
> > > >            ---> shrink_page_list
> > > 
> > > Can you actually measure this to be the culprit. Because we should rethink
> > > our call path if it is too complicated/deep to perform well.
> > > Adding arbitrary batch sizes doesn't sound like a good way to go to me.
> > 
> > Ok, I will try
> > 
> 
> Looping in mem_cgroup_resize_limit(), which takes memcg_limit_mutex on 
> every iteration which contends with lowering limits in other cgroups (on 
> our systems, thousands), calling try_to_free_mem_cgroup_pages() with less 
> than SWAP_CLUSTER_MAX is lame.

Well, if the global lock is a bottleneck in your deployments then we
can come up with something more clever. E.g. per hierarchy locking
or even drop the lock for the reclaim altogether. If we reclaim in
SWAP_CLUSTER_MAX then the potential over-reclaim risk quite low when
multiple users are shrinking the same (sub)hierarchy.

> It would probably be best to limit the 
> nr_pages to the amount that needs to be reclaimed, though, rather than 
> over reclaiming.

How do you achieve that? The charging path is not synchornized with the
shrinking one at all.

> If you wanted to be invasive, you could change page_counter_limit() to 
> return the count - limit, fix up the callers that look for -EBUSY, and 
> then use max(val, SWAP_CLUSTER_MAX) as your nr_pages.

I am not sure I understand

-- 
Michal Hocko
SUSE Labs