lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201112141654.GC12240@dhcp22.suse.cz>
Date:   Thu, 12 Nov 2020 15:16:54 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Feng Tang <feng.tang@...el.com>
Cc:     Xing Zhengjun <zhengjun.xing@...ux.intel.com>,
        Waiman Long <longman@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Chris Down <chris@...isdown.name>,
        Johannes Weiner <hannes@...xchg.org>,
        Roman Gushchin <guro@...com>, Tejun Heo <tj@...nel.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Yafang Shao <laoar.shao@...il.com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        lkp@...el.com, zhengjun.xing@...el.com, ying.huang@...el.com
Subject: Re: [LKP] Re: [mm/memcg] bd0b230fe1: will-it-scale.per_process_ops
 -22.7% regression

On Thu 12-11-20 20:28:44, Feng Tang wrote:
> Hi Michal,
> 
> On Wed, Nov 04, 2020 at 09:15:46AM +0100, Michal Hocko wrote:
> > > > > Hi Michal,
> > > > > 
> > > > > We used the default configure of cgroups, not sure what configuration you
> > > > > want,
> > > > > could you give me more details? and here is the cgroup info of will-it-scale
> > > > > process:
> > > > > 
> > > > > $ cat /proc/3042/cgroup
> > > > > 12:hugetlb:/
> > > > > 11:memory:/system.slice/lkp-bootstrap.service
> > > > 
> > > > OK, this means that memory controler is enabled and in use. Btw. do you
> > > > get the original performance if you add one phony page_counter after the
> > > > union?
> > > > 
> > > I add one phony page_counter after the union and re-test, the regression
> > > reduced to -1.2%. It looks like the regression caused by the data structure
> > > layout change.
> > 
> > Thanks for double checking. Could you try to cache align the
> > page_counter struct? If that helps then we should figure which counters
> > acks against each other by adding the alignement between the respective
> > counters. 
> 
> We tried below patch to make the 'page_counter' aligned.
>   
>   diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h
>   index bab7e57..9efa6f7 100644
>   --- a/include/linux/page_counter.h
>   +++ b/include/linux/page_counter.h
>   @@ -26,7 +26,7 @@ struct page_counter {
>    	/* legacy */
>    	unsigned long watermark;
>    	unsigned long failcnt;
>   -};
>   +} ____cacheline_internodealigned_in_smp;
>    
> and with it, the -22.7% peformance change turns to a small -1.7%, which
> confirms the performance bump is caused by the change to data alignment.
> 
> After the patch, size of 'page_counter' increases from 104 bytes to 128
> bytes, and the size of 'mem_cgroup' increases from 2880 bytes to 3008
> bytes(with our kernel config). Another major data structure which
> contains 'page_counter' is 'hugetlb_cgroup', whose size will change
> from 912B to 1024B.
> 
> Should we make these page_counters aligned to reduce cacheline conflict?

I would rather focus on a more effective mem_cgroup layout. It is very
likely that we are just stumbling over two counters here.

Could you try to add cache alignment of counters after memory and see
which one makes the difference? I do not expect memsw to be the one
because that one is used together with the main counter. But who knows
maybe the way it crosses the cache line has the exact effect. Hard to
tell without other numbers.

Btw. it would be great to see what the effect is on cgroup v2 as well.

Thanks for pursuing this!
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ