lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180320083950.GD23100@dhcp22.suse.cz>
Date:   Tue, 20 Mar 2018 09:39:50 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     David Rientjes <rientjes@...gle.com>
Cc:     "Li,Rongqing" <lirongqing@...du.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
        "hannes@...xchg.org" <hannes@...xchg.org>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>
Subject: Re: 答复: 答复: [PATCH]
 mm/memcontrol.c: speed up to force empty a memory cgroup

On Mon 19-03-18 10:51:57, David Rientjes wrote:
> On Mon, 19 Mar 2018, Li,Rongqing wrote:
> 
> > > > Although SWAP_CLUSTER_MAX is used at the lower level, but the call
> > > > stack of try_to_free_mem_cgroup_pages is too long, increase the
> > > > nr_to_reclaim can reduce times of calling
> > > > function[do_try_to_free_pages, shrink_zones, hrink_node ]
> > > >
> > > > mem_cgroup_resize_limit
> > > > --->try_to_free_mem_cgroup_pages:  .nr_to_reclaim = max(1024,
> > > > --->SWAP_CLUSTER_MAX),
> > > >    ---> do_try_to_free_pages
> > > >      ---> shrink_zones
> > > >       --->shrink_node
> > > >        ---> shrink_node_memcg
> > > >          ---> shrink_list          <-------loop will happen in this place
> > > [times=1024/32]
> > > >            ---> shrink_page_list
> > > 
> > > Can you actually measure this to be the culprit. Because we should rethink
> > > our call path if it is too complicated/deep to perform well.
> > > Adding arbitrary batch sizes doesn't sound like a good way to go to me.
> > 
> > Ok, I will try
> > 
> 
> Looping in mem_cgroup_resize_limit(), which takes memcg_limit_mutex on 
> every iteration which contends with lowering limits in other cgroups (on 
> our systems, thousands), calling try_to_free_mem_cgroup_pages() with less 
> than SWAP_CLUSTER_MAX is lame.

Well, if the global lock is a bottleneck in your deployments then we
can come up with something more clever. E.g. per hierarchy locking
or even drop the lock for the reclaim altogether. If we reclaim in
SWAP_CLUSTER_MAX then the potential over-reclaim risk quite low when
multiple users are shrinking the same (sub)hierarchy.

> It would probably be best to limit the 
> nr_pages to the amount that needs to be reclaimed, though, rather than 
> over reclaiming.

How do you achieve that? The charging path is not synchornized with the
shrinking one at all.

> If you wanted to be invasive, you could change page_counter_limit() to 
> return the count - limit, fix up the callers that look for -EBUSY, and 
> then use max(val, SWAP_CLUSTER_MAX) as your nr_pages.

I am not sure I understand

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ