[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200223141147.GA53531@shbuild999.sh.intel.com>
Date: Sun, 23 Feb 2020 22:11:47 +0800
From: Feng Tang <feng.tang@...el.com>
To: Jiri Olsa <jolsa@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
kernel test robot <rong.a.chen@...el.com>,
Ingo Molnar <mingo@...nel.org>,
Vince Weaver <vincent.weaver@...ne.edu>,
Jiri Olsa <jolsa@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"Naveen N. Rao" <naveen.n.rao@...ux.vnet.ibm.com>,
Ravi Bangoria <ravi.bangoria@...ux.ibm.com>,
Stephane Eranian <eranian@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
andi.kleen@...el.com, ying.huang@...el.com
Subject: Re: [LKP] Re: [perf/x86] 81ec3f3c4c: will-it-scale.per_process_ops
-5.5% regression
Hi Jiri,
On Fri, Feb 21, 2020 at 02:20:48PM +0100, Jiri Olsa wrote:
> > We are also curious that the commit seems to be completely not
> > relative to this scalability test of signal, which starts a task
> > for each online CPU, and keeps calling raise(), and calculating
> > the run numbers.
> >
> > One experiment we did is checking which part of the commit
> > really affects the test, and it turned out to be the change of
> > "struct pmu". Effectively, applying this patch upon 5.0-rc6
> > which triggers the same regression.
> > So likely, this commit changes the layout of the kernel text
> > and data, which may trigger some cacheline level change. From
> > the system map of the 2 kernels, a big trunk of symbol's address
> > changes which follow the global "pmu",
>
> nice, I wonder we could see that in perf c2c output ;-)
> I'll try to run and check
Thanks for the "perf c2c" suggestion.
I tried to use perf-c2c on one platform (not the one that show
the 5.5% regression), and found the main "hitm" points to the
"root_user" global data, as there is a task for each CPU doing
the signal stress test, and both __sigqueue_alloc() and
__sigqueue_free() will call get_user() and free_uid() to inc/dec
this root_user's refcount.
Then I added some alignement inside struct "user_struct" (for
"root_user"), then the -5.5% is gone, with a +2.6% instead.
One c2c report log is attached.
One thing I don't understand is, this -5.5% only happens in
one 2 sockets, 96C/192T Cascadelake platform, as we've run
the same test on several different platforms. In therory,
the false sharing may also take effect?
Thanks,
Feng
View attachment "c2c_wis_sig_32T.log" of type "text/plain" (173969 bytes)
Powered by blists - more mailing lists