linux-kernel - Re: regression caused by cgroups optimization in 3.17-rc2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140910162936.GI25219@dhcp22.suse.cz>
Date:	Wed, 10 Sep 2014 18:29:36 +0200
From:	Michal Hocko <mhocko@...e.cz>
To:	Dave Hansen <dave@...1.net>
Cc:	Johannes Weiner <hannes@...xchg.org>,
	Hugh Dickins <hughd@...gle.com>,
	Dave Hansen <dave.hansen@...el.com>, Tejun Heo <tj@...nel.org>,
	Linux-MM <linux-mm@...ck.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Vladimir Davydov <vdavydov@...allels.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: regression caused by cgroups optimization in 3.17-rc2

On Fri 05-09-14 11:25:37, Michal Hocko wrote:
> On Thu 04-09-14 13:27:26, Dave Hansen wrote:
> > On 09/04/2014 07:27 AM, Michal Hocko wrote:
> > > Ouch. free_pages_and_swap_cache completely kills the uncharge batching
> > > because it reduces it to PAGEVEC_SIZE batches.
> > > 
> > > I think we really do not need PAGEVEC_SIZE batching anymore. We are
> > > already batching on tlb_gather layer. That one is limited so I think
> > > the below should be safe but I have to think about this some more. There
> > > is a risk of prolonged lru_lock wait times but the number of pages is
> > > limited to 10k and the heavy work is done outside of the lock. If this
> > > is really a problem then we can tear LRU part and the actual
> > > freeing/uncharging into a separate functions in this path.
> > > 
> > > Could you test with this half baked patch, please? I didn't get to test
> > > it myself unfortunately.
> > 
> > 3.16 settled out at about 11.5M faults/sec before the regression.  This
> > patch gets it back up to about 10.5M, which is good.
> 
> Dave, would you be willing to test the following patch as well? I do not
> have a huge machine at hand right now. It would be great if you could

I was playing with 48CPU with 32G of RAM machine but the res_counter
lock didn't show up in the traces much (this was with 96 processes doing
mmap (256M private file, faul, unmap in parallel):
                          |--0.75%-- __res_counter_charge
                          |          res_counter_charge
                          |          try_charge
                          |          mem_cgroup_try_charge
                          |          |          
                          |          |--81.56%-- do_cow_fault
                          |          |          handle_mm_fault
                          |          |          __do_page_fault
                          |          |          do_page_fault
                          |          |          page_fault
[...]
                          |          |          
                          |           --18.44%-- __add_to_page_cache_locked
                          |                     add_to_page_cache_lru
                          |                     mpage_readpages
                          |                     ext4_readpages
                          |                     __do_page_cache_readahead
                          |                     ondemand_readahead
                          |                     page_cache_async_readahead
                          |                     filemap_fault
                          |                     __do_fault
                          |                     do_cow_fault
                          |                     handle_mm_fault
                          |                     __do_page_fault
                          |                     do_page_fault
                          |                     page_fault

Nothing really changed in that regards when I reduced mmap size to 128M
and run with 4*CPUs.

I do not have a bigger machine to play with unfortunately. I think the
patch makes sense on its own. I would really appreciate if you could
give it a try on your machine with !root memcg case to see how much it
helped. I would expect similar results to your previous testing without
the revert and Johannes' patch.

[...]
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/