linux-kernel - Re: Unusually high system CPU usage with recent kernels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130823132026.116850@gmx.com>
Date:	Fri, 23 Aug 2013 15:20:25 +0200
From:	"Tibor Billes" <tbilles@....com>
To:	paulmck@...ux.vnet.ibm.com
Cc:	linux-kernel@...r.kernel.org
Subject: Re: Unusually high system CPU usage with recent kernels

> From: Paul E. McKenney Sent: 08/22/13 12:09 AM
> On Wed, Aug 21, 2013 at 11:05:51PM +0200, Tibor Billes wrote:
> > > From: Paul E. McKenney Sent: 08/21/13 09:12 PM
> > > On Wed, Aug 21, 2013 at 08:14:46PM +0200, Tibor Billes wrote:
> > > > > From: Paul E. McKenney Sent: 08/20/13 11:43 PM
> > > > > On Tue, Aug 20, 2013 at 10:52:26PM +0200, Tibor Billes wrote:
> > > > > > > From: Paul E. McKenney Sent: 08/20/13 04:53 PM
> > > > > > > On Tue, Aug 20, 2013 at 08:01:28AM +0200, Tibor Billes wrote:
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > I was using the 3.9.7 stable release and tried to upgrade to the 3.10.x series.
> > > > > > > > The 3.10.x series was showing unusually high (>75%) system CPU usage in some
> > > > > > > > situations, making things really slow. The latest stable I tried is 3.10.7.
> > > > > > > > I also tried 3.11-rc5, they both show this behaviour. This behaviour doesn't
> > > > > > > > show up when the system is idling, only when doing some CPU intensive work,
> > > > > > > > like compiling with multiple threads. Compiling with only one thread seems not
> > > > > > > > to trigger this behaviour.
> > > > > > > > 
> > > > > > > > To be more precise I did a `perf record -a` while compiling a large C++ program
> > > > > > > > with scons using 4 threads, the result is appended at the end of this email.
> > > > > > > 
> > > > > > > New one on me! You are running a mainstream system (x86_64), so I am
> > > > > > > surprised no one else noticed.
> > > > > > > 
> > > > > > > Could you please send along your .config file?
> > > > > > 
> > > > > > Here it is
> > > > > 
> > > > > Interesting. I don't see RCU stuff all that high on the list, but
> > > > > the items I do see lead me to suspect RCU_FAST_NO_HZ, which has some
> > > > > relevance to the otherwise inexplicable group of commits you located
> > > > > with your bisection. Could you please rerun with CONFIG_RCU_FAST_NO_HZ=n?
> > > > > 
> > > > > If that helps, there are some things I could try.
> > > > 
> > > > It did help. I didn't notice anything unusual when running with CONFIG_RCU_FAST_NO_HZ=n.
> > > 
> > > Interesting. Thank you for trying this -- and we at least have a
> > > short-term workaround for this problem. I will put a patch together
> > > for further investigation.
> > 
> > I don't specifically need this config option so I'm fine without it in
> > the long term, but I guess it's not supposed to behave like that.
> 
> OK, good, we have a long-term workload for your specific case,
> even better. ;-)
> 
> But yes, there are situations where RCU_FAST_NO_HZ needs to work
> a bit better. I hope you will bear with me with a bit more
> testing...

Don't worry, I will :) Unfortunately I didn't have time yesterday and I
won't have time today either. But I'll do what you asked tomorrow and I'll
send you the results.

Tibor

> > > In the meantime, could you please tell me how you were measuring
> > > performance for your kernel builds? Wall-clock time required to complete
> > > one build? Number of builds completed per unit time? Something else?
> > 
> > Actually, I wasn't all this sophisticated. I have a system monitor
> > applet on my top panel (using MATE, Linux Mint), four little graphs,
> > one of which shows CPU usage. Different colors indicate different kind
> > of CPU usage. Blue shows user space usage, red shows system usage, and
> > two more for nice and iowait. During a normal compile it's almost
> > completely filled with blue user space CPU usage, only the top few
> > pixels show some iowait and system usage. With CONFIG_RCU_FAST_NO_HZ
> > set, about 3/4 of the graph was red system CPU usage, the rest was
> > blue. So I just looked for a pile of red on my graphs when I tested
> > different kernel builds. But also compile speed was horrible I couldn't
> > wait for the build to finish. Even the UI got unresponsive.
> 
> We have been having problems with CPU accounting, but this one looks
> quite real.
> 
> > Now I did some measuring. In the normal case a compile finished in 36
> > seconds, compiled 315 object files. Here are some output lines from
> > dstat -tasm --vm during compile:
> > ----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----swap--- ------memory-usage----- -----virtual-memory----
> >     time     |usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw | used  free| used  buff  cach  free|majpf minpf alloc  free
> > 21-08 21:48:05| 91   8   2   0   0   0|   0  5852k|   0     0 |   0     0 |1413  1772 |   0  7934M| 581M 58.0M  602M 6553M|   0    71k   46k   54k
> > 21-08 21:48:06| 93   6   1   0   0   0|   0  2064k| 137B  131B|   0     0 |1356  1650 |   0  7934M| 649M 58.0M  604M 6483M|   0    72k   47k   28k
> > 21-08 21:48:07| 86  11   4   0   0   0|   0  5872k|   0     0 |   0     0 |2000  2991 |   0  7934M| 577M 58.0M  627M 6531M|   0    99k   67k   79k
> > 21-08 21:48:08| 87   9   3   0   0   0|   0  2840k|   0     0 |   0     0 |2558  4164 |   0  7934M| 597M 58.0M  632M 6507M|   0    96k   57k   51k
> > 21-08 21:48:09| 93   7   1   0   0   0|   0  3032k|   0     0 |   0     0 |1329  1512 |   0  7934M| 641M 58.0M  626M 6469M|   0    61k   48k   39k
> > 21-08 21:48:10| 93   6   0   0   0   0|   0  4984k|   0     0 |   0     0 |1160  1146 |   0  7934M| 572M 58.0M  628M 6536M|   0    50k   40k   57k
> > 21-08 21:48:11| 86   9   6   0   0   0|   0  2520k|   0     0 |   0     0 |2947  4760 |   0  7934M| 605M 58.0M  631M 6500M|   0   103k   55k   45k
> > 21-08 21:48:12| 90   8   2   0   0   0|   0  2840k|   0     0 |   0     0 |2674  4179 |   0  7934M| 671M 58.0M  635M 6431M|   0    84k   59k   42k
> > 21-08 21:48:13| 90   9   1   0   0   0|   0  4656k|   0     0 |   0     0 |1223  1410 |   0  7934M| 643M 58.0M  638M 6455M|   0    90k   62k   68k
> > 21-08 21:48:14| 91   8   1   0   0   0|   0  3572k|   0     0 |   0     0 |1432  1828 |   0  7934M| 647M 58.0M  641M 6447M|   0    81k   59k   57k
> > 21-08 21:48:15| 91   8   1   0   0   0|   0  5116k| 116B    0 |   0     0 |1194  1295 |   0  7934M| 605M 58.0M  644M 6487M|   0    69k   54k   64k
> > 21-08 21:48:16| 87  10   3   0   0   0|   0  5140k|   0     0 |   0     0 |1761  2586 |   0  7934M| 584M 58.0M  650M 6502M|   0   105k   64k   68k
> > 
> > The abnormal case compiled only 182 object file in 6 and a half minutes,
> > then I stopped it. The same dstat output for this case:
> > ----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----swap--- ------memory-usage----- -----virtual-memory----
> >     time     |usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw | used  free| used  buff  cach  free|majpf minpf alloc  free
> > 21-08 22:10:49| 27  62   0   0  11   0|   0     0 | 210B    0 |   0     0 |1414  3137k|   0  7934M| 531M 57.6M  595M 6611M|   0  1628  1250   322 
> > 21-08 22:10:50| 25  60   4   0  11   0|   0    88k| 126B    0 |   0     0 |1337  3110k|   0  7934M| 531M 57.6M  595M 6611M|   0    91   128   115 
> > 21-08 22:10:51| 26  63   0   0  11   0|   0   184k| 294B    0 |   0     0 |1411  3147k|   0  7934M| 531M 57.6M  595M 6611M|   0  1485   814   815 
> > 21-08 22:10:52| 26  63   0   0  11   0|   0     0 | 437B  239B|   0     0 |1355  3160k|   0  7934M| 531M 57.6M  595M 6611M|   0    24    94    97 
> > 21-08 22:10:53| 26  63   0   0  11   0|   0     0 | 168B    0 |   0     0 |1397  3155k|   0  7934M| 531M 57.6M  595M 6611M|   0   479   285   273 
> > 21-08 22:10:54| 26  63   0   0  11   0|   0  4096B| 396B  324B|   0     0 |1346  3154k|   0  7934M| 531M 57.6M  595M 6611M|   0    27   145   145 
> > 21-08 22:10:55| 26  63   0   0  11   0|   0    60k|   0     0 |   0     0 |1353  3148k|   0  7934M| 531M 57.6M  595M 6610M|   0    93   117    36 
> > 21-08 22:10:56| 26  63   0   0  11   0|   0     0 |   0     0 |   0     0 |1341  3172k|   0  7934M| 531M 57.6M  595M 6610M|   0   158    87    74 
> > 21-08 22:10:57| 26  62   1   0  11   0|   0     0 |  42B   60B|   0     0 |1332  3162k|   0  7934M| 531M 57.6M  595M 6610M|   0    56    82    78 
> > 21-08 22:10:58| 26  63   0   0  11   0|   0     0 |   0     0 |   0     0 |1334  3178k|   0  7934M| 531M 57.6M  595M 6610M|   0    26    56    56 
> > 21-08 22:10:59| 26  63   0   0  11   0|   0     0 |   0     0 |   0     0 |1336  3179k|   0  7934M| 531M 57.6M  595M 6610M|   0     3    33    32 
> > 21-08 22:11:00| 26  63   0   0  11   0|   0    24k|  90B  108B|   0     0 |1347  3172k|   0  7934M| 531M 57.6M  595M 6610M|   0    41    73    71
> > 
> > I have four logical cores so 25% makes up 1 core. I don't know if the ~26% user CPU usage has anthing to do with this fact or just coincidence. The rest is ~63% system and ~11% hardware interrupt. Do these support what you suspect?
> 
> The massive increase in context switches does come as a bit of a surprise!
> It does rule out my initial suspicion of lock contention, but then again
> the fact that you have only four CPUs made that pretty unlikely to begin
> with.
> 
> 2.4k average context switches in the good case for the full run vs. 3,156k
> for about half of a run in the bad case. That is an increase of more
> than three orders of magnitude!
> 
> Yow!!!
> 
> Page faults are actually -higher- in the good case. You have about 6.5GB
> free in both cases, so you are not running out of memory. Lots more disk
> writes in the good case, perhaps consistent with its getting more done.
> Networking is negligible in both cases.
> 
> Lots of hardware interrupts in the bad case as well. Would you be willing
> to take a look at /proc/interrupts before and after to see which one you
> are getting hit with? (Or whatever interrupt tracking tool you prefer.)
> 
> One hypothesis is that your workload and configuration are interacting
> with RCU_FAST_NO_HZ to force very large numbers of RCU grace periods.
> Could you please check for this by building with CONFIG_RCU_TRACE=y,
> mounting debugfs somewhere and dumping rcu/rcu_sched/rcugp before and
> after each run?
> 
> Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/