[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130903211101.189830@gmx.com>
Date: Tue, 03 Sep 2013 23:11:01 +0200
From: "Tibor Billes" <tbilles@....com>
To: paulmck@...ux.vnet.ibm.com
Cc: linux-kernel@...r.kernel.org
Subject: Re: Unusually high system CPU usage with recent kernels
> From: Paul E. McKenney Sent: 08/30/13 03:24 AM
> On Tue, Aug 27, 2013 at 10:05:42PM +0200, Tibor Billes wrote:
> > From: Paul E. McKenney Sent: 08/26/13 06:28 AM
> > > Here is a patch that is more likely to help. I am testing it in parallel,
> > > but figured I should send you a sneak preview.
> >
> > I tried it, but I don't see any difference in overall performance. The dstat
> > also shows the same as before.
> >
> > But I did notice something. Occasionally there is an increase in userspace
> > CPU usage, interrupts and context switches are dropping, and it really gets
> > more work done (scons printed commands more frequently). I checked that
> > this behaviour is present without your patch, I just didn't notice this
> > before. Maybe you can make some sense out of it.
> >
> > ----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----swap--- ------memory-usage----- -----virtual-memory----
> > time |usr sys idl wai hiq siq| read writ| recv send| in out | int csw | used free| used buff cach free|majpf minpf alloc free
> > 27-08 20:51:53| 23 62 5 0 11 0| 0 0 | 0 0 | 0 0 |1274 3102k| 0 7934M| 549M 56.0M 491M 6698M| 0 28 156 159
> > 27-08 20:51:54| 24 64 1 0 11 0| 0 0 | 0 0 | 0 0 |1317 3165k| 0 7934M| 549M 56.0M 491M 6698M| 0 53 189 182
> > 27-08 20:51:55| 33 50 6 2 9 0| 192k 1832k| 0 0 | 0 0 |1371 2442k| 0 7934M| 544M 56.0M 492M 6702M| 0 30k 17k 17k
> > 27-08 20:51:56| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1313 3220k| 0 7934M| 544M 56.0M 492M 6701M| 0 21 272 232
> > 27-08 20:51:57| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1319 3226k| 0 7934M| 544M 56.0M 492M 6701M| 0 8 96 112
> > 27-08 20:51:58| 25 63 0 0 12 0| 0 0 | 0 0 | 0 0 |1317 3224k| 0 7934M| 544M 56.0M 492M 6701M| 0 12 145 141
> > 27-08 20:51:59| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1317 3223k| 0 7934M| 544M 56.0M 492M 6701M| 0 54 193 191
> > 27-08 20:52:00| 25 63 0 0 12 0| 0 24k| 0 0 | 0 0 |1336 3216k| 0 7934M| 544M 56.0M 492M 6701M| 0 36 161 172
> > 27-08 20:52:01| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1313 3225k| 0 7934M| 544M 56.0M 492M 6701M| 0 9 107 107
> > 27-08 20:52:02| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1327 3224k| 0 7934M| 545M 56.0M 492M 6701M| 0 13 193 200
> > 27-08 20:52:03| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1311 3226k| 0 7934M| 545M 56.0M 492M 6701M| 0 13 114 114
> > 27-08 20:52:04| 25 63 0 0 12 0| 0 0 | 0 0 | 0 0 |1331 3223k| 0 7934M| 544M 56.0M 492M 6701M| 0 41 190 178
> > 27-08 20:52:05| 24 64 0 0 12 0| 0 8192B| 0 0 | 0 0 |1315 3222k| 0 7934M| 544M 56.0M 492M 6701M| 0 30 123 122
> > 27-08 20:52:06| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1314 3223k| 0 7934M| 544M 56.0M 492M 6701M| 0 16 187 195
> > 27-08 20:52:07| 25 63 1 0 12 0|2212k 192k| 0 0 | 0 0 |1637 3194k| 0 7934M| 544M 56.2M 494M 6699M| 0 1363 2590 1947
> > 27-08 20:52:08| 17 33 18 26 6 0|3208k 0 | 0 0 | 0 0 |1351 1766k| 0 7934M| 561M 56.3M 499M 6678M| 4 10k 7620 2055
> > 27-08 20:52:09| 47 21 16 13 4 0|4332k 628k| 0 0 | 0 0 |1400 1081k| 0 7934M| 647M 56.3M 504M 6587M| 10 24k 25k 1151
> > 27-08 20:52:10| 36 34 13 11 6 0|2636k 2820k| 0 0 | 0 0 |1451 1737k| 0 7934M| 598M 56.3M 507M 6632M| 5 19k 16k 28k
> > 27-08 20:52:11| 46 17 10 25 3 0|4288k 536k| 0 0 | 0 0 |1386 868k| 0 7934M| 613M 56.3M 513M 6611M| 24 13k 8908 3616
> > 27-08 20:52:12| 53 33 5 4 5 0|4740k 3992k| 0 0 | 0 0 |1773 1464k| 0 7934M| 562M 56.6M 521M 6654M| 0 36k 29k 40k
> > 27-08 20:52:13| 60 34 0 1 6 0|4228k 1416k| 0 0 | 0 0 |1642 1670k| 0 7934M| 593M 56.6M 526M 6618M| 0 36k 26k 17k
> > 27-08 20:52:14| 53 37 1 3 7 0|3008k 1976k| 0 0 | 0 0 |1513 1972k| 0 7934M| 541M 56.6M 529M 6668M| 0 10k 9986 23k
> > 27-08 20:52:15| 55 34 1 4 6 0|3636k 1284k| 0 0 | 0 0 |1645 1688k| 0 7934M| 581M 56.6M 535M 6622M| 0 43k 30k 18k
> > 27-08 20:52:16| 57 30 5 2 5 0|4404k 2320k| 0 0 | 0 0 |1715 1489k| 0 7934M| 570M 56.6M 541M 6627M| 0 39k 24k 26k
> > 27-08 20:52:17| 50 35 3 7 6 0|2520k 1972k| 0 0 | 0 0 |1699 1598k| 0 7934M| 587M 56.6M 554M 6596M| 0 65k 40k 32k
> > 27-08 20:52:18| 52 40 2 1 7 0|1556k 1732k| 0 0 | 0 0 |1582 1865k| 0 7934M| 551M 56.6M 567M 6619M| 0 35k 26k 33k
> > 27-08 20:52:19| 26 62 0 0 12 0| 0 0 | 0 0 | 0 0 |1351 3240k| 0 7934M| 551M 56.6M 568M 6619M| 0 86 440 214
> > 27-08 20:52:20| 26 63 0 0 11 0| 0 108k| 0 0 | 0 0 |1392 3162k| 0 7934M| 555M 56.6M 560M 6623M| 0 1801 1490 2672
> > 27-08 20:52:21| 25 63 0 0 12 0| 0 0 | 0 0 | 0 0 |1332 3198k| 0 7934M| 555M 56.6M 560M 6623M| 0 50 245 255
> > 27-08 20:52:22| 25 63 1 0 12 0| 0 0 | 0 0 | 0 0 |1350 3220k| 0 7934M| 556M 56.6M 560M 6622M| 0 755 544 286
> > 27-08 20:52:23| 27 62 1 0 11 0| 0 272k| 0 0 | 0 0 |1323 3092k| 0 7934M| 551M 56.6M 558M 6628M| 0 341 1464 3085
> > 27-08 20:52:24| 25 63 0 0 12 0| 0 0 | 0 0 | 0 0 |1334 3197k| 0 7934M| 551M 56.6M 559M 6627M| 0 63 514 273
> > 27-08 20:52:25| 25 63 0 0 12 0| 0 40k| 0 0 | 0 0 |1329 3243k| 0 7934M| 546M 56.6M 558M 6633M| 0 321 160 1679
> > 27-08 20:52:26| 39 50 1 1 9 0| 48k 644k| 0 0 | 0 0 |1500 2556k| 0 7934M| 552M 56.6M 560M 6625M| 0 30k 14k 12k
> > 27-08 20:52:27| 26 62 1 0 11 0| 0 192k| 0 0 | 0 0 |1380 3152k| 0 7934M| 553M 56.6M 560M 6624M| 0 2370 808 718
> > 27-08 20:52:28| 23 55 12 0 10 0| 0 0 | 0 0 | 0 0 |1247 2993k| 0 7934M| 553M 56.6M 561M 6624M| 0 1060 428 241
> > 27-08 20:52:29| 25 63 1 0 11 0| 0 0 | 0 0 | 0 0 |1318 3142k| 0 7934M| 554M 56.6M 561M 6623M| 0 663 442 198
> > 27-08 20:52:30| 25 64 0 0 12 0| 0 100k| 0 0 | 0 0 |1316 3212k| 0 7934M| 554M 56.6M 561M 6623M| 0 42 187 186
> > 27-08 20:52:31| 24 64 0 0 12 0| 0 4096B| 0 0 | 0 0 |1309 3222k| 0 7934M| 554M 56.6M 561M 6623M| 0 9 85 85
> > 27-08 20:52:32| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1297 3219k| 0 7934M| 554M 56.6M 561M 6623M| 0 23 95 84
> > 27-08 20:52:33| 25 63 0 0 11 0| 0 0 | 0 0 | 0 0 |1339 3101k| 0 7934M| 555M 56.6M 557M 6625M| 0 923 1566 2176
> > 27-08 20:52:34| 25 63 0 0 11 0| 0 0 | 0 0 | 0 0 |1333 3095k| 0 7934M| 555M 56.6M 559M 6623M| 0 114 701 195
> > 27-08 20:52:35| 26 63 0 0 11 0| 0 0 | 0 0 | 0 0 |1342 3036k| 0 7934M| 557M 56.6M 568M 6613M| 0 957 3225 516
> > 27-08 20:52:36| 26 60 2 0 11 0| 0 28k| 0 0 | 0 0 |1340 3091k| 0 7934M| 555M 56.6M 568M 6614M| 0 5304 5422 5609
> > 27-08 20:52:37| 25 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1348 3260k| 0 7934M| 556M 56.6M 565M 6617M| 0 30 156 1073
> > 27-08 20:52:38| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1314 3211k| 0 7934M| 556M 56.6M 549M 6633M| 0 11 105 4285
> > 27-08 20:52:39| 26 63 0 0 11 0| 0 0 | 0 0 | 0 0 |1353 3031k| 0 7934M| 558M 56.6M 559M 6620M| 0 847 3866 357
> > 27-08 20:52:40| 26 63 0 0 12 0| 0 0 | 0 0 | 0 0 |1309 3135k| 0 7934M| 569M 56.6M 566M 6602M| 0 3940 5727 1288
>
> Interesting. The number of context switches drops during the time
> that throughput improves. It would be good to find out what task(s)
> are doing all of the context switches. One way to find the pids of the
> top 20 context-switching tasks should be something like this:
>
> grep ctxt /proc/*/status | sort -k2nr | head -20
>
> You could use any number of methods to map back to the command.
> When generating my last patch, I was assuming that ksoftirqd would be
> the big offender. Of course, if it is something else, I should be
> taking a different approach.
I did some measuring of these context switches. I used the above command
line to get the number of context switches before compiling and about 40
seconds later.
The good case is:
file before after process name
/proc/10/status:voluntary_ctxt_switches: 38108 54327 [rcu_sched]
/proc/19/status:voluntary_ctxt_switches: 31022 38682 [ksoftirqd/2]
/proc/14/status:voluntary_ctxt_switches: 12947 16914 [ksoftirqd/1]
/proc/3/status:voluntary_ctxt_switches: 11061 14885 [ksoftirqd/0]
/proc/194/status:voluntary_ctxt_switches: 10044 11660 [kworker/2:1H]
/proc/1603/status:voluntary_ctxt_switches: 7626 9593 /usr/bin/X
/proc/24/status:voluntary_ctxt_switches: 6415 9571 [ksoftirqd/3]
/proc/20/status:voluntary_ctxt_switches: 5317 8879 [kworker/2:0]
The bad case is:
file before after process name
/proc/3/status:voluntary_ctxt_switches: 82411972 98542227 [ksoftirqd/0]
/proc/23/status:voluntary_ctxt_switches: 79592040 94206823 [ksoftirqd/3]
/proc/13/status:voluntary_ctxt_switches: 79053870 93654161 [ksoftirqd/1]
/proc/18/status:voluntary_ctxt_switches: 78136449 92288688 [ksoftirqd/2]
/proc/2293/status:nonvoluntary_ctxt_switches: 29038881 29038881 mate-panel
/proc/2308/status:nonvoluntary_ctxt_switches: 26607661 26607661 nm-applet
/proc/2317/status:nonvoluntary_ctxt_switches: 15494474 15494474 /usr/lib/polkit-mate/polkit-mate-authentication-agent-1
/proc/2148/status:nonvoluntary_ctxt_switches: 13763674 13763674 x-session-manager
/proc/2985/status:nonvoluntary_ctxt_switches: 12062706 python /usr/bin/scons -j4
/proc/2323/status:nonvoluntary_ctxt_switches: 11581510 11581510 mate-volume-control-applet
/proc/2353/status:nonvoluntary_ctxt_switches: 9213436 9213436 mate-power-manager
/proc/2305/status:nonvoluntary_ctxt_switches: 8328471 8328471 /usr/lib/matecomponent/matecomponent-activation-server
/proc/3041/status:nonvoluntary_ctxt_switches: 7638312 sh -c "/opt/arm-2011.03/bin/arm-none-eabi-gcc" [...]
/proc/3055/status:nonvoluntary_ctxt_switches: 4843977 sh -c "/opt/arm-2011.03/bin/arm-none-eabi-gcc" [...]
/proc/2253/status:nonvoluntary_ctxt_switches: 4449918 4449918 mate-keyring-daemon
The processes for the tables were partly collected by hand so they may
not contain every relevant process. But what I see is that there are a
lot more context switches in the bad case, quite many of these are non
voluntary. The context switch increase is pretty much everywhere, not
just ksoftirqd. I suspect the Mate related processes did their context
switches during bootup/login, which is consistent with the bootup/login
procedure being significantly slower (just like the compilation).
Tibor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists