[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20130830144650.189850@gmx.com>
Date: Fri, 30 Aug 2013 16:46:50 +0200
From: "Tibor Billes" <tbilles@....com>
To: paulmck@...ux.vnet.ibm.com
Cc: linux-kernel@...r.kernel.org
Subject: Re: Unusually high system CPU usage with recent kernels
> From: Paul E. McKenney Sent: 08/30/13 03:24 AM
> To: Tibor Billes
> Subject: Re: Unusually high system CPU usage with recent kernels
>
> On Tue, Aug 27, 2013 at 10:05:42PM +0200, Tibor Billes wrote:
> > > From: Paul E. McKenney Sent: 08/26/13 06:28 AM
> > > On Sun, Aug 25, 2013 at 09:50:21PM +0200, Tibor Billes wrote:
> > > > From: Paul E. McKenney Sent: 08/24/13 11:03 PM
> > > > > On Sat, Aug 24, 2013 at 09:59:45PM +0200, Tibor Billes wrote:
> > > > > > From: Paul E. McKenney Sent: 08/22/13 12:09 AM
> > > > > > > On Wed, Aug 21, 2013 at 11:05:51PM +0200, Tibor Billes wrote:
> > > > > > > > > From: Paul E. McKenney Sent: 08/21/13 09:12 PM
> > > > > > > > > On Wed, Aug 21, 2013 at 08:14:46PM +0200, Tibor Billes wrote:
> > > > > > > > > > > From: Paul E. McKenney Sent: 08/20/13 11:43 PM
> > > > > > > > > > > On Tue, Aug 20, 2013 at 10:52:26PM +0200, Tibor Billes wrote:
> > > > > > > > > > > > > From: Paul E. McKenney Sent: 08/20/13 04:53 PM
> > > > > > > > > > > > > On Tue, Aug 20, 2013 at 08:01:28AM +0200, Tibor Billes wrote:
> > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I was using the 3.9.7 stable release and tried to upgrade to the 3.10.x series.
> > > > > > > > > > > > > > The 3.10.x series was showing unusually high (>75%) system CPU usage in some
> > > > > > > > > > > > > > situations, making things really slow. The latest stable I tried is 3.10.7.
> > > > > > > > > > > > > > I also tried 3.11-rc5, they both show this behaviour. This behaviour doesn't
> > > > > > > > > > > > > > show up when the system is idling, only when doing some CPU intensive work,
> > > > > > > > > > > > > > like compiling with multiple threads. Compiling with only one thread seems not
> > > > > > > > > > > > > > to trigger this behaviour.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > To be more precise I did a `perf record -a` while compiling a large C++ program
> > > > > > > > > > > > > > with scons using 4 threads, the result is appended at the end of this email.
> > > > > > > > > > > > >
> > > > > > > > > > > > > New one on me! You are running a mainstream system (x86_64), so I am
> > > > > > > > > > > > > surprised no one else noticed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Could you please send along your .config file?
> > > > > > > > > > > >
> > > > > > > > > > > > Here it is
> > > > > > > > > > >
> > > > > > > > > > > Interesting. I don't see RCU stuff all that high on the list, but
> > > > > > > > > > > the items I do see lead me to suspect RCU_FAST_NO_HZ, which has some
> > > > > > > > > > > relevance to the otherwise inexplicable group of commits you located
> > > > > > > > > > > with your bisection. Could you please rerun with CONFIG_RCU_FAST_NO_HZ=n?
> > > > > > > > > > >
> > > > > > > > > > > If that helps, there are some things I could try.
> > > > > > > > > >
> > > > > > > > > > It did help. I didn't notice anything unusual when running with CONFIG_RCU_FAST_NO_HZ=n.
> > > > > > > > >
> > > > > > > > > Interesting. Thank you for trying this -- and we at least have a
> > > > > > > > > short-term workaround for this problem. I will put a patch together
> > > > > > > > > for further investigation.
> > > > > > > >
> > > > > > > > I don't specifically need this config option so I'm fine without it in
> > > > > > > > the long term, but I guess it's not supposed to behave like that.
> > > > > > >
> > > > > > > OK, good, we have a long-term workload for your specific case,
> > > > > > > even better. ;-)
> > > > > > >
> > > > > > > But yes, there are situations where RCU_FAST_NO_HZ needs to work
> > > > > > > a bit better. I hope you will bear with me with a bit more
> > > > > > > testing...
> > > > > > >
> > > > > > > > > In the meantime, could you please tell me how you were measuring
> > > > > > > > > performance for your kernel builds? Wall-clock time required to complete
> > > > > > > > > one build? Number of builds completed per unit time? Something else?
> > > > > > > >
> > > > > > > > Actually, I wasn't all this sophisticated. I have a system monitor
> > > > > > > > applet on my top panel (using MATE, Linux Mint), four little graphs,
> > > > > > > > one of which shows CPU usage. Different colors indicate different kind
> > > > > > > > of CPU usage. Blue shows user space usage, red shows system usage, and
> > > > > > > > two more for nice and iowait. During a normal compile it's almost
> > > > > > > > completely filled with blue user space CPU usage, only the top few
> > > > > > > > pixels show some iowait and system usage. With CONFIG_RCU_FAST_NO_HZ
> > > > > > > > set, about 3/4 of the graph was red system CPU usage, the rest was
> > > > > > > > blue. So I just looked for a pile of red on my graphs when I tested
> > > > > > > > different kernel builds. But also compile speed was horrible I couldn't
> > > > > > > > wait for the build to finish. Even the UI got unresponsive.
> > > > > > >
> > > > > > > We have been having problems with CPU accounting, but this one looks
> > > > > > > quite real.
> > > > > > >
> > > > > > > > Now I did some measuring. In the normal case a compile finished in 36
> > > > > > > > seconds, compiled 315 object files. Here are some output lines from
> > > > > > > > dstat -tasm --vm during compile:
> > > > > > > > ----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----swap--- ------memory-usage----- -----virtual-memory----
> > > > > > > > time |usr sys idl wai hiq siq| read writ| recv send| in out | int csw | used free| used buff cach free|majpf minpf alloc free
> > > > > > > > 21-08 21:48:05| 91 8 2 0 0 0| 0 5852k| 0 0 | 0 0 |1413 1772 | 0 7934M| 581M 58.0M 602M 6553M| 0 71k 46k 54k
> > > > > > > > 21-08 21:48:06| 93 6 1 0 0 0| 0 2064k| 137B 131B| 0 0 |1356 1650 | 0 7934M| 649M 58.0M 604M 6483M| 0 72k 47k 28k
> > > > > > > > 21-08 21:48:07| 86 11 4 0 0 0| 0 5872k| 0 0 | 0 0 |2000 2991 | 0 7934M| 577M 58.0M 627M 6531M| 0 99k 67k 79k
> > > > > > > > 21-08 21:48:08| 87 9 3 0 0 0| 0 2840k| 0 0 | 0 0 |2558 4164 | 0 7934M| 597M 58.0M 632M 6507M| 0 96k 57k 51k
> > > > > > > > 21-08 21:48:09| 93 7 1 0 0 0| 0 3032k| 0 0 | 0 0 |1329 1512 | 0 7934M| 641M 58.0M 626M 6469M| 0 61k 48k 39k
> > > > > > > > 21-08 21:48:10| 93 6 0 0 0 0| 0 4984k| 0 0 | 0 0 |1160 1146 | 0 7934M| 572M 58.0M 628M 6536M| 0 50k 40k 57k
> > > > > > > > 21-08 21:48:11| 86 9 6 0 0 0| 0 2520k| 0 0 | 0 0 |2947 4760 | 0 7934M| 605M 58.0M 631M 6500M| 0 103k 55k 45k
> > > > > > > > 21-08 21:48:12| 90 8 2 0 0 0| 0 2840k| 0 0 | 0 0 |2674 4179 | 0 7934M| 671M 58.0M 635M 6431M| 0 84k 59k 42k
> > > > > > > > 21-08 21:48:13| 90 9 1 0 0 0| 0 4656k| 0 0 | 0 0 |1223 1410 | 0 7934M| 643M 58.0M 638M 6455M| 0 90k 62k 68k
> > > > > > > > 21-08 21:48:14| 91 8 1 0 0 0| 0 3572k| 0 0 | 0 0 |1432 1828 | 0 7934M| 647M 58.0M 641M 6447M| 0 81k 59k 57k
> > > > > > > > 21-08 21:48:15| 91 8 1 0 0 0| 0 5116k| 116B 0 | 0 0 |1194 1295 | 0 7934M| 605M 58.0M 644M 6487M| 0 69k 54k 64k
> > > > > > > > 21-08 21:48:16| 87 10 3 0 0 0| 0 5140k| 0 0 | 0 0 |1761 2586 | 0 7934M| 584M 58.0M 650M 6502M| 0 105k 64k 68k
> > > > > > > >
> > > > > > > > The abnormal case compiled only 182 object file in 6 and a half minutes,
> > > > > > > > then I stopped it. The same dstat output for this case:
> > > > > > > > ----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----swap--- ------memory-usage----- -----virtual-memory----
> > > > > > > > time |usr sys idl wai hiq siq| read writ| recv send| in out | int csw | used free| used buff cach free|majpf minpf alloc free
> > > > > > > > 21-08 22:10:49| 27 62 0 0 11 0| 0 0 | 210B 0 | 0 0 |1414 3137k| 0 7934M| 531M 57.6M 595M 6611M| 0 1628 1250 322
> > > > > > > > 21-08 22:10:50| 25 60 4 0 11 0| 0 88k| 126B 0 | 0 0 |1337 3110k| 0 7934M| 531M 57.6M 595M 6611M| 0 91 128 115
> > > > > > > > 21-08 22:10:51| 26 63 0 0 11 0| 0 184k| 294B 0 | 0 0 |1411 3147k| 0 7934M| 531M 57.6M 595M 6611M| 0 1485 814 815
> > > > > > > > 21-08 22:10:52| 26 63 0 0 11 0| 0 0 | 437B 239B| 0 0 |1355 3160k| 0 7934M| 531M 57.6M 595M 6611M| 0 24 94 97
> > > > > > > > 21-08 22:10:53| 26 63 0 0 11 0| 0 0 | 168B 0 | 0 0 |1397 3155k| 0 7934M| 531M 57.6M 595M 6611M| 0 479 285 273
> > > > > > > > 21-08 22:10:54| 26 63 0 0 11 0| 0 4096B| 396B 324B| 0 0 |1346 3154k| 0 7934M| 531M 57.6M 595M 6611M| 0 27 145 145
> > > > > > > > 21-08 22:10:55| 26 63 0 0 11 0| 0 60k| 0 0 | 0 0 |1353 3148k| 0 7934M| 531M 57.6M 595M 6610M| 0 93 117 36
> > > > > > > > 21-08 22:10:56| 26 63 0 0 11 0| 0 0 | 0 0 | 0 0 |1341 3172k| 0 7934M| 531M 57.6M 595M 6610M| 0 158 87 74
> > > > > > > > 21-08 22:10:57| 26 62 1 0 11 0| 0 0 | 42B 60B| 0 0 |1332 3162k| 0 7934M| 531M 57.6M 595M 6610M| 0 56 82 78
> > > > > > > > 21-08 22:10:58| 26 63 0 0 11 0| 0 0 | 0 0 | 0 0 |1334 3178k| 0 7934M| 531M 57.6M 595M 6610M| 0 26 56 56
> > > > > > > > 21-08 22:10:59| 26 63 0 0 11 0| 0 0 | 0 0 | 0 0 |1336 3179k| 0 7934M| 531M 57.6M 595M 6610M| 0 3 33 32
> > > > > > > > 21-08 22:11:00| 26 63 0 0 11 0| 0 24k| 90B 108B| 0 0 |1347 3172k| 0 7934M| 531M 57.6M 595M 6610M| 0 41 73 71
> > > > > > > >
> > > > > > > > I have four logical cores so 25% makes up 1 core. I don't know if the ~26% user CPU usage has anthing to do with this fact or just coincidence. The rest is ~63% system and ~11% hardware interrupt. Do these support what you suspect?
> > > > > > >
> > > > > > > The massive increase in context switches does come as a bit of a surprise!
> > > > > > > It does rule out my initial suspicion of lock contention, but then again
> > > > > > > the fact that you have only four CPUs made that pretty unlikely to begin
> > > > > > > with.
> > > > > > >
> > > > > > > 2.4k average context switches in the good case for the full run vs. 3,156k
> > > > > > > for about half of a run in the bad case. That is an increase of more
> > > > > > > than three orders of magnitude!
> > > > > > >
> > > > > > > Yow!!!
> > > > > > >
> > > > > > > Page faults are actually -higher- in the good case. You have about 6.5GB
> > > > > > > free in both cases, so you are not running out of memory. Lots more disk
> > > > > > > writes in the good case, perhaps consistent with its getting more done.
> > > > > > > Networking is negligible in both cases.
> > > > > > >
> > > > > > > Lots of hardware interrupts in the bad case as well. Would you be willing
> > > > > > > to take a look at /proc/interrupts before and after to see which one you
> > > > > > > are getting hit with? (Or whatever interrupt tracking tool you prefer.)
> > > > > >
> > > > > > Here are the results.
> > > > > >
> > > > > > Good case before:
> > > > > > CPU0 CPU1 CPU2 CPU3
> > > > > > 0: 17 0 0 0 IO-APIC-edge timer
> > > > > > 1: 356 1 68 4 IO-APIC-edge i8042
> > > > > > 8: 0 0 1 0 IO-APIC-edge rtc0
> > > > > > 9: 330 14 449 71 IO-APIC-fasteoi acpi
> > > > > > 12: 10 108 269 2696 IO-APIC-edge i8042
> > > > > > 16: 36 10 111 2 IO-APIC-fasteoi ehci_hcd:usb1
> > > > > > 17: 20 3 25 4 IO-APIC-fasteoi mmc0
> > > > > > 21: 3 0 34 0 IO-APIC-fasteoi ehci_hcd:usb2
> > > > > > 40: 0 1 12 11 PCI-MSI-edge mei_me
> > > > > > 41: 10617 173 9959 292 PCI-MSI-edge ahci
> > > > > > 42: 862 11 186 26 PCI-MSI-edge xhci_hcd
> > > > > > 43: 107 77 27 102 PCI-MSI-edge i915
> > > > > > 44: 5322 20 434 22 PCI-MSI-edge iwlwifi
> > > > > > 45: 180 0 183 86 PCI-MSI-edge snd_hda_intel
> > > > > > 46: 0 3 0 0 PCI-MSI-edge eth0
> > > > > > NMI: 1 0 0 0 Non-maskable interrupts
> > > > > > LOC: 16312 15177 10840 8995 Local timer interrupts
> > > > > > SPU: 0 0 0 0 Spurious interrupts
> > > > > > PMI: 1 0 0 0 Performance monitoring interrupts
> > > > > > IWI: 1160 523 1031 481 IRQ work interrupts
> > > > > > RTR: 3 0 0 0 APIC ICR read retries
> > > > > > RES: 14976 16135 9973 10784 Rescheduling interrupts
> > > > > > CAL: 482 457 151 370 Function call interrupts
> > > > > > TLB: 70 106 352 230 TLB shootdowns
> > > > > > TRM: 0 0 0 0 Thermal event interrupts
> > > > > > THR: 0 0 0 0 Threshold APIC interrupts
> > > > > > MCE: 0 0 0 0 Machine check exceptions
> > > > > > MCP: 2 2 2 2 Machine check polls
> > > > > > ERR: 0
> > > > > > MIS: 0
> > > > > >
> > > > > > Good case after:
> > > > > > CPU0 CPU1 CPU2 CPU3
> > > > > > 0: 17 0 0 0 IO-APIC-edge timer
> > > > > > 1: 367 1 81 4 IO-APIC-edge i8042
> > > > > > 8: 0 0 1 0 IO-APIC-edge rtc0
> > > > > > 9: 478 14 460 71 IO-APIC-fasteoi acpi
> > > > > > 12: 10 108 269 2696 IO-APIC-edge i8042
> > > > > > 16: 36 10 111 2 IO-APIC-fasteoi ehci_hcd:usb1
> > > > > > 17: 20 3 25 4 IO-APIC-fasteoi mmc0
> > > > > > 21: 3 0 34 0 IO-APIC-fasteoi ehci_hcd:usb2
> > > > > > 40: 0 1 12 11 PCI-MSI-edge mei_me
> > > > > > 41: 16888 173 9959 292 PCI-MSI-edge ahci
> > > > > > 42: 1102 11 186 26 PCI-MSI-edge xhci_hcd
> > > > > > 43: 107 132 27 136 PCI-MSI-edge i915
> > > > > > 44: 6943 20 434 22 PCI-MSI-edge iwlwifi
> > > > > > 45: 180 0 183 86 PCI-MSI-edge snd_hda_intel
> > > > > > 46: 0 3 0 0 PCI-MSI-edge eth0
> > > > > > NMI: 4 3 3 3 Non-maskable interrupts
> > > > > > LOC: 26845 24780 19025 17746 Local timer interrupts
> > > > > > SPU: 0 0 0 0 Spurious interrupts
> > > > > > PMI: 4 3 3 3 Performance monitoring interrupts
> > > > > > IWI: 1637 751 1287 695 IRQ work interrupts
> > > > > > RTR: 3 0 0 0 APIC ICR read retries
> > > > > > RES: 26511 26673 18791 20194 Rescheduling interrupts
> > > > > > CAL: 510 480 151 370 Function call interrupts
> > > > > > TLB: 361 292 575 461 TLB shootdowns
> > > > > > TRM: 0 0 0 0 Thermal event interrupts
> > > > > > THR: 0 0 0 0 Threshold APIC interrupts
> > > > > > MCE: 0 0 0 0 Machine check exceptions
> > > > > > MCP: 2 2 2 2 Machine check polls
> > > > > > ERR: 0
> > > > > > MIS: 0
> > > > > >
> > > > > > Bad case before:
> > > > > > CPU0 CPU1 CPU2 CPU3
> > > > > > 0: 17 0 0 0 IO-APIC-edge timer
> > > > > > 1: 172 3 78 3 IO-APIC-edge i8042
> > > > > > 8: 0 1 0 0 IO-APIC-edge rtc0
> > > > > > 9: 1200 148 395 81 IO-APIC-fasteoi acpi
> > > > > > 12: 1625 2 348 10 IO-APIC-edge i8042
> > > > > > 16: 26 23 115 0 IO-APIC-fasteoi ehci_hcd:usb1
> > > > > > 17: 16 3 12 21 IO-APIC-fasteoi mmc0
> > > > > > 21: 2 2 33 0 IO-APIC-fasteoi ehci_hcd:usb2
> > > > > > 40: 0 0 14 10 PCI-MSI-edge mei_me
> > > > > > 41: 15776 374 8497 687 PCI-MSI-edge ahci
> > > > > > 42: 1297 829 115 24 PCI-MSI-edge xhci_hcd
> > > > > > 43: 103 149 9 212 PCI-MSI-edge i915
> > > > > > 44: 13151 101 511 91 PCI-MSI-edge iwlwifi
> > > > > > 45: 153 159 0 122 PCI-MSI-edge snd_hda_intel
> > > > > > 46: 0 1 1 0 PCI-MSI-edge eth0
> > > > > > NMI: 32 31 31 31 Non-maskable interrupts
> > > > > > LOC: 82504 82732 74172 75985 Local timer interrupts
> > > > > > SPU: 0 0 0 0 Spurious interrupts
> > > > > > PMI: 32 31 31 31 Performance monitoring interrupts
> > > > > > IWI: 17816 16278 13833 13282 IRQ work interrupts
> > > > > > RTR: 3 0 0 0 APIC ICR read retries
> > > > > > RES: 18784 21084 13313 12946 Rescheduling interrupts
> > > > > > CAL: 393 422 306 356 Function call interrupts
> > > > > > TLB: 231 176 235 191 TLB shootdowns
> > > > > > TRM: 0 0 0 0 Thermal event interrupts
> > > > > > THR: 0 0 0 0 Threshold APIC interrupts
> > > > > > MCE: 0 0 0 0 Machine check exceptions
> > > > > > MCP: 3 3 3 3 Machine check polls
> > > > > > ERR: 0
> > > > > > MIS: 0
> > > > > >
> > > > > > Bad case after:
> > > > > > CPU0 CPU1 CPU2 CPU3
> > > > > > 0: 17 0 0 0 IO-APIC-edge timer
> > > > > > 1: 415 3 85 3 IO-APIC-edge i8042
> > > > > > 8: 0 1 0 0 IO-APIC-edge rtc0
> > > > > > 9: 1277 148 428 81 IO-APIC-fasteoi acpi
> > > > > > 12: 1625 2 348 10 IO-APIC-edge i8042
> > > > > > 16: 26 23 115 0 IO-APIC-fasteoi ehci_hcd:usb1
> > > > > > 17: 16 3 12 21 IO-APIC-fasteoi mmc0
> > > > > > 21: 2 2 33 0 IO-APIC-fasteoi ehci_hcd:usb2
> > > > > > 40: 0 0 14 10 PCI-MSI-edge mei_me
> > > > > > 41: 17814 374 8497 687 PCI-MSI-edge ahci
> > > > > > 42: 1567 829 115 24 PCI-MSI-edge xhci_hcd
> > > > > > 43: 103 177 9 242 PCI-MSI-edge i915
> > > > > > 44: 14956 101 511 91 PCI-MSI-edge iwlwifi
> > > > > > 45: 153 159 0 122 PCI-MSI-edge snd_hda_intel
> > > > > > 46: 0 1 1 0 PCI-MSI-edge eth0
> > > > > > NMI: 36 35 34 34 Non-maskable interrupts
> > > > > > LOC: 92429 92708 81714 84071 Local timer interrupts
> > > > > > SPU: 0 0 0 0 Spurious interrupts
> > > > > > PMI: 36 35 34 34 Performance monitoring interrupts
> > > > > > IWI: 22594 19658 17439 14257 IRQ work interrupts
> > > > > > RTR: 3 0 0 0 APIC ICR read retries
> > > > > > RES: 21491 24670 14618 14569 Rescheduling interrupts
> > > > > > CAL: 441 439 306 356 Function call interrupts
> > > > > > TLB: 232 181 274 465 TLB shootdowns
> > > > > > TRM: 0 0 0 0 Thermal event interrupts
> > > > > > THR: 0 0 0 0 Threshold APIC interrupts
> > > > > > MCE: 0 0 0 0 Machine check exceptions
> > > > > > MCP: 3 3 3 3 Machine check polls
> > > > > > ERR: 0
> > > > > > MIS: 0
> > > > >
> > > > > Lots more local timer interrupts, which is consistent with the higher
> > > > > time in interrupt handlers for the bad case.
> > > > >
> > > > > > > One hypothesis is that your workload and configuration are interacting
> > > > > > > with RCU_FAST_NO_HZ to force very large numbers of RCU grace periods.
> > > > > > > Could you please check for this by building with CONFIG_RCU_TRACE=y,
> > > > > > > mounting debugfs somewhere and dumping rcu/rcu_sched/rcugp before and
> > > > > > > after each run?
> > > > > >
> > > > > > Good case before:
> > > > > > completed=8756 gpnum=8757 age=0 max=21
> > > > > > after:
> > > > > > completed=14686 gpnum=14687 age=0 max=21
> > > > > >
> > > > > > Bad case before:
> > > > > > completed=22970 gpnum=22971 age=0 max=21
> > > > > > after:
> > > > > > completed=26110 gpnum=26111 age=0 max=21
> > > > >
> > > > > In the good case, (14686-8756)/40=148.25 grace periods per second, which
> > > > > is a fast but reasonable rate given your HZ=250. Not a large difference
> > > > > in the number of grace periods, but extrapolating for the longer runtime,
> > > > > maybe ten times as much. But not much change in grace-period rate per
> > > > > unit time.
> > > > >
> > > > > > The test scenario was the following in both cases (mixed english and pseudo-bash):
> > > > > > reboot, login, start terminal
> > > > > > cd project
> > > > > > rm -r build
> > > > > > cat /proc/interrupts >> somefile ; cat /sys/kernel/debug/rcu/rcu_sched/rcugp >> somefile
> > > > > > scons -j4
> > > > > > wait ~40 sec (good case finished, Ctrl-C in bad case)
> > > > > > cat /proc/interrupts >> somefile ; cat /sys/kernel/debug/rcu/rcu_sched/rcugp >> somefile
> > > > > >
> > > > > > I stopped the build in the bad case after about the same time the good
> > > > > > case finished, so the extra interrupts and RCU grace periods because of the
> > > > > > longer runtime don't fake the results.
> > > > >
> > > > > That procedure works for me, thank you for laying it out carefully.
> > > > >
> > > > > I believe I see what is going on and how to fix it, though it may take
> > > > > me a bit to work things through and get a good patch.
> > > > >
> > > > > Thank you very much for your testing efforts!
> > > >
> > > > I'm glad I can help. I've been using Linux for many years, now I have a
> > > > chance to help the community, to do something in return. I'm quite
> > > > enjoying this :)
> > >
> > > ;-)
> > >
> > > Here is a patch that is more likely to help. I am testing it in parallel,
> > > but figured I should send you a sneak preview.
> >
> > I tried it, but I don't see any difference in overall performance. The dstat
> > also shows the same as before.
> >
> > But I did notice something. Occasionally there is an increase in userspace
> > CPU usage, interrupts and context switches are dropping, and it really gets
> > more work done (scons printed commands more frequently). I checked that
> > this behaviour is present without your patch, I just didn't notice this
> > before. Maybe you can make some sense out of it.
> >
> > ----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----swap--- ------memory-usage----- -----virtual-memory----
> > time |usr sys idl wai hiq siq| read writ| recv send| in out | int csw | used free| used buff cach free|majpf minpf alloc free
> > 27-08 20:51:53| 23 62 5 0 11 0| 0 0 | 0 0 | 0 0 |1274 3102k| 0 7934M| 549M 56.0M 491M 6698M| 0 28 156 159
> > 27-08 20:51:54| 24 64 1 0 11 0| 0 0 | 0 0 | 0 0 |1317 3165k| 0 7934M| 549M 56.0M 491M 6698M| 0 53 189 182
> > 27-08 20:51:55| 33 50 6 2 9 0| 192k 1832k| 0 0 | 0 0 |1371 2442k| 0 7934M| 544M 56.0M 492M 6702M| 0 30k 17k 17k
> > 27-08 20:51:56| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1313 3220k| 0 7934M| 544M 56.0M 492M 6701M| 0 21 272 232
> > 27-08 20:51:57| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1319 3226k| 0 7934M| 544M 56.0M 492M 6701M| 0 8 96 112
> > 27-08 20:51:58| 25 63 0 0 12 0| 0 0 | 0 0 | 0 0 |1317 3224k| 0 7934M| 544M 56.0M 492M 6701M| 0 12 145 141
> > 27-08 20:51:59| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1317 3223k| 0 7934M| 544M 56.0M 492M 6701M| 0 54 193 191
> > 27-08 20:52:00| 25 63 0 0 12 0| 0 24k| 0 0 | 0 0 |1336 3216k| 0 7934M| 544M 56.0M 492M 6701M| 0 36 161 172
> > 27-08 20:52:01| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1313 3225k| 0 7934M| 544M 56.0M 492M 6701M| 0 9 107 107
> > 27-08 20:52:02| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1327 3224k| 0 7934M| 545M 56.0M 492M 6701M| 0 13 193 200
> > 27-08 20:52:03| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1311 3226k| 0 7934M| 545M 56.0M 492M 6701M| 0 13 114 114
> > 27-08 20:52:04| 25 63 0 0 12 0| 0 0 | 0 0 | 0 0 |1331 3223k| 0 7934M| 544M 56.0M 492M 6701M| 0 41 190 178
> > 27-08 20:52:05| 24 64 0 0 12 0| 0 8192B| 0 0 | 0 0 |1315 3222k| 0 7934M| 544M 56.0M 492M 6701M| 0 30 123 122
> > 27-08 20:52:06| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1314 3223k| 0 7934M| 544M 56.0M 492M 6701M| 0 16 187 195
> > 27-08 20:52:07| 25 63 1 0 12 0|2212k 192k| 0 0 | 0 0 |1637 3194k| 0 7934M| 544M 56.2M 494M 6699M| 0 1363 2590 1947
> > 27-08 20:52:08| 17 33 18 26 6 0|3208k 0 | 0 0 | 0 0 |1351 1766k| 0 7934M| 561M 56.3M 499M 6678M| 4 10k 7620 2055
> > 27-08 20:52:09| 47 21 16 13 4 0|4332k 628k| 0 0 | 0 0 |1400 1081k| 0 7934M| 647M 56.3M 504M 6587M| 10 24k 25k 1151
> > 27-08 20:52:10| 36 34 13 11 6 0|2636k 2820k| 0 0 | 0 0 |1451 1737k| 0 7934M| 598M 56.3M 507M 6632M| 5 19k 16k 28k
> > 27-08 20:52:11| 46 17 10 25 3 0|4288k 536k| 0 0 | 0 0 |1386 868k| 0 7934M| 613M 56.3M 513M 6611M| 24 13k 8908 3616
> > 27-08 20:52:12| 53 33 5 4 5 0|4740k 3992k| 0 0 | 0 0 |1773 1464k| 0 7934M| 562M 56.6M 521M 6654M| 0 36k 29k 40k
> > 27-08 20:52:13| 60 34 0 1 6 0|4228k 1416k| 0 0 | 0 0 |1642 1670k| 0 7934M| 593M 56.6M 526M 6618M| 0 36k 26k 17k
> > 27-08 20:52:14| 53 37 1 3 7 0|3008k 1976k| 0 0 | 0 0 |1513 1972k| 0 7934M| 541M 56.6M 529M 6668M| 0 10k 9986 23k
> > 27-08 20:52:15| 55 34 1 4 6 0|3636k 1284k| 0 0 | 0 0 |1645 1688k| 0 7934M| 581M 56.6M 535M 6622M| 0 43k 30k 18k
> > 27-08 20:52:16| 57 30 5 2 5 0|4404k 2320k| 0 0 | 0 0 |1715 1489k| 0 7934M| 570M 56.6M 541M 6627M| 0 39k 24k 26k
> > 27-08 20:52:17| 50 35 3 7 6 0|2520k 1972k| 0 0 | 0 0 |1699 1598k| 0 7934M| 587M 56.6M 554M 6596M| 0 65k 40k 32k
> > 27-08 20:52:18| 52 40 2 1 7 0|1556k 1732k| 0 0 | 0 0 |1582 1865k| 0 7934M| 551M 56.6M 567M 6619M| 0 35k 26k 33k
> > 27-08 20:52:19| 26 62 0 0 12 0| 0 0 | 0 0 | 0 0 |1351 3240k| 0 7934M| 551M 56.6M 568M 6619M| 0 86 440 214
> > 27-08 20:52:20| 26 63 0 0 11 0| 0 108k| 0 0 | 0 0 |1392 3162k| 0 7934M| 555M 56.6M 560M 6623M| 0 1801 1490 2672
> > 27-08 20:52:21| 25 63 0 0 12 0| 0 0 | 0 0 | 0 0 |1332 3198k| 0 7934M| 555M 56.6M 560M 6623M| 0 50 245 255
> > 27-08 20:52:22| 25 63 1 0 12 0| 0 0 | 0 0 | 0 0 |1350 3220k| 0 7934M| 556M 56.6M 560M 6622M| 0 755 544 286
> > 27-08 20:52:23| 27 62 1 0 11 0| 0 272k| 0 0 | 0 0 |1323 3092k| 0 7934M| 551M 56.6M 558M 6628M| 0 341 1464 3085
> > 27-08 20:52:24| 25 63 0 0 12 0| 0 0 | 0 0 | 0 0 |1334 3197k| 0 7934M| 551M 56.6M 559M 6627M| 0 63 514 273
> > 27-08 20:52:25| 25 63 0 0 12 0| 0 40k| 0 0 | 0 0 |1329 3243k| 0 7934M| 546M 56.6M 558M 6633M| 0 321 160 1679
> > 27-08 20:52:26| 39 50 1 1 9 0| 48k 644k| 0 0 | 0 0 |1500 2556k| 0 7934M| 552M 56.6M 560M 6625M| 0 30k 14k 12k
> > 27-08 20:52:27| 26 62 1 0 11 0| 0 192k| 0 0 | 0 0 |1380 3152k| 0 7934M| 553M 56.6M 560M 6624M| 0 2370 808 718
> > 27-08 20:52:28| 23 55 12 0 10 0| 0 0 | 0 0 | 0 0 |1247 2993k| 0 7934M| 553M 56.6M 561M 6624M| 0 1060 428 241
> > 27-08 20:52:29| 25 63 1 0 11 0| 0 0 | 0 0 | 0 0 |1318 3142k| 0 7934M| 554M 56.6M 561M 6623M| 0 663 442 198
> > 27-08 20:52:30| 25 64 0 0 12 0| 0 100k| 0 0 | 0 0 |1316 3212k| 0 7934M| 554M 56.6M 561M 6623M| 0 42 187 186
> > 27-08 20:52:31| 24 64 0 0 12 0| 0 4096B| 0 0 | 0 0 |1309 3222k| 0 7934M| 554M 56.6M 561M 6623M| 0 9 85 85
> > 27-08 20:52:32| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1297 3219k| 0 7934M| 554M 56.6M 561M 6623M| 0 23 95 84
> > 27-08 20:52:33| 25 63 0 0 11 0| 0 0 | 0 0 | 0 0 |1339 3101k| 0 7934M| 555M 56.6M 557M 6625M| 0 923 1566 2176
> > 27-08 20:52:34| 25 63 0 0 11 0| 0 0 | 0 0 | 0 0 |1333 3095k| 0 7934M| 555M 56.6M 559M 6623M| 0 114 701 195
> > 27-08 20:52:35| 26 63 0 0 11 0| 0 0 | 0 0 | 0 0 |1342 3036k| 0 7934M| 557M 56.6M 568M 6613M| 0 957 3225 516
> > 27-08 20:52:36| 26 60 2 0 11 0| 0 28k| 0 0 | 0 0 |1340 3091k| 0 7934M| 555M 56.6M 568M 6614M| 0 5304 5422 5609
> > 27-08 20:52:37| 25 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1348 3260k| 0 7934M| 556M 56.6M 565M 6617M| 0 30 156 1073
> > 27-08 20:52:38| 24 64 0 0 12 0| 0 0 | 0 0 | 0 0 |1314 3211k| 0 7934M| 556M 56.6M 549M 6633M| 0 11 105 4285
> > 27-08 20:52:39| 26 63 0 0 11 0| 0 0 | 0 0 | 0 0 |1353 3031k| 0 7934M| 558M 56.6M 559M 6620M| 0 847 3866 357
> > 27-08 20:52:40| 26 63 0 0 12 0| 0 0 | 0 0 | 0 0 |1309 3135k| 0 7934M| 569M 56.6M 566M 6602M| 0 3940 5727 1288
>
> Interesting. The number of context switches drops during the time
> that throughput improves. It would be good to find out what task(s)
> are doing all of the context switches. One way to find the pids of the
> top 20 context-switching tasks should be something like this:
>
> grep ctxt /proc/*/status | sort -k2nr | head -20
>
> You could use any number of methods to map back to the command.
> When generating my last patch, I was assuming that ksoftirqd would be
> the big offender. Of course, if it is something else, I should be
> taking a different approach.
I'll measure it after the weekend.
Tibor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists