linux-kernel - Re: Unusually high system CPU usage with recent kernels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20130826042812.GO3871@linux.vnet.ibm.com>
Date:	Sun, 25 Aug 2013 21:28:12 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Tibor Billes <tbilles@....com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: Unusually high system CPU usage with recent kernels

On Sun, Aug 25, 2013 at 09:50:21PM +0200, Tibor Billes wrote:
> From: Paul E. McKenney Sent: 08/24/13 11:03 PM
> > On Sat, Aug 24, 2013 at 09:59:45PM +0200, Tibor Billes wrote:
> > > From: Paul E. McKenney Sent: 08/22/13 12:09 AM
> > > > On Wed, Aug 21, 2013 at 11:05:51PM +0200, Tibor Billes wrote:
> > > > > > From: Paul E. McKenney Sent: 08/21/13 09:12 PM
> > > > > > On Wed, Aug 21, 2013 at 08:14:46PM +0200, Tibor Billes wrote:
> > > > > > > > From: Paul E. McKenney Sent: 08/20/13 11:43 PM
> > > > > > > > On Tue, Aug 20, 2013 at 10:52:26PM +0200, Tibor Billes wrote:
> > > > > > > > > > From: Paul E. McKenney Sent: 08/20/13 04:53 PM
> > > > > > > > > > On Tue, Aug 20, 2013 at 08:01:28AM +0200, Tibor Billes wrote:
> > > > > > > > > > > Hi,
> > > > > > > > > > > 
> > > > > > > > > > > I was using the 3.9.7 stable release and tried to upgrade to the 3.10.x series.
> > > > > > > > > > > The 3.10.x series was showing unusually high (>75%) system CPU usage in some
> > > > > > > > > > > situations, making things really slow. The latest stable I tried is 3.10.7.
> > > > > > > > > > > I also tried 3.11-rc5, they both show this behaviour. This behaviour doesn't
> > > > > > > > > > > show up when the system is idling, only when doing some CPU intensive work,
> > > > > > > > > > > like compiling with multiple threads. Compiling with only one thread seems not
> > > > > > > > > > > to trigger this behaviour.
> > > > > > > > > > > 
> > > > > > > > > > > To be more precise I did a `perf record -a` while compiling a large C++ program
> > > > > > > > > > > with scons using 4 threads, the result is appended at the end of this email.
> > > > > > > > > > 
> > > > > > > > > > New one on me! You are running a mainstream system (x86_64), so I am
> > > > > > > > > > surprised no one else noticed.
> > > > > > > > > > 
> > > > > > > > > > Could you please send along your .config file?
> > > > > > > > > 
> > > > > > > > > Here it is
> > > > > > > > 
> > > > > > > > Interesting. I don't see RCU stuff all that high on the list, but
> > > > > > > > the items I do see lead me to suspect RCU_FAST_NO_HZ, which has some
> > > > > > > > relevance to the otherwise inexplicable group of commits you located
> > > > > > > > with your bisection. Could you please rerun with CONFIG_RCU_FAST_NO_HZ=n?
> > > > > > > > 
> > > > > > > > If that helps, there are some things I could try.
> > > > > > > 
> > > > > > > It did help. I didn't notice anything unusual when running with CONFIG_RCU_FAST_NO_HZ=n.
> > > > > > 
> > > > > > Interesting. Thank you for trying this -- and we at least have a
> > > > > > short-term workaround for this problem. I will put a patch together
> > > > > > for further investigation.
> > > > > 
> > > > > I don't specifically need this config option so I'm fine without it in
> > > > > the long term, but I guess it's not supposed to behave like that.
> > > > 
> > > > OK, good, we have a long-term workload for your specific case,
> > > > even better. ;-)
> > > > 
> > > > But yes, there are situations where RCU_FAST_NO_HZ needs to work
> > > > a bit better. I hope you will bear with me with a bit more
> > > > testing...
> > > >
> > > > > > In the meantime, could you please tell me how you were measuring
> > > > > > performance for your kernel builds? Wall-clock time required to complete
> > > > > > one build? Number of builds completed per unit time? Something else?
> > > > > 
> > > > > Actually, I wasn't all this sophisticated. I have a system monitor
> > > > > applet on my top panel (using MATE, Linux Mint), four little graphs,
> > > > > one of which shows CPU usage. Different colors indicate different kind
> > > > > of CPU usage. Blue shows user space usage, red shows system usage, and
> > > > > two more for nice and iowait. During a normal compile it's almost
> > > > > completely filled with blue user space CPU usage, only the top few
> > > > > pixels show some iowait and system usage. With CONFIG_RCU_FAST_NO_HZ
> > > > > set, about 3/4 of the graph was red system CPU usage, the rest was
> > > > > blue. So I just looked for a pile of red on my graphs when I tested
> > > > > different kernel builds. But also compile speed was horrible I couldn't
> > > > > wait for the build to finish. Even the UI got unresponsive.
> > > > 
> > > > We have been having problems with CPU accounting, but this one looks
> > > > quite real.
> > > > 
> > > > > Now I did some measuring. In the normal case a compile finished in 36
> > > > > seconds, compiled 315 object files. Here are some output lines from
> > > > > dstat -tasm --vm during compile:
> > > > > ----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----swap--- ------memory-usage----- -----virtual-memory----
> > > > >     time     |usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw | used  free| used  buff  cach  free|majpf minpf alloc  free
> > > > > 21-08 21:48:05| 91   8   2   0   0   0|   0  5852k|   0     0 |   0     0 |1413  1772 |   0  7934M| 581M 58.0M  602M 6553M|   0    71k   46k   54k
> > > > > 21-08 21:48:06| 93   6   1   0   0   0|   0  2064k| 137B  131B|   0     0 |1356  1650 |   0  7934M| 649M 58.0M  604M 6483M|   0    72k   47k   28k
> > > > > 21-08 21:48:07| 86  11   4   0   0   0|   0  5872k|   0     0 |   0     0 |2000  2991 |   0  7934M| 577M 58.0M  627M 6531M|   0    99k   67k   79k
> > > > > 21-08 21:48:08| 87   9   3   0   0   0|   0  2840k|   0     0 |   0     0 |2558  4164 |   0  7934M| 597M 58.0M  632M 6507M|   0    96k   57k   51k
> > > > > 21-08 21:48:09| 93   7   1   0   0   0|   0  3032k|   0     0 |   0     0 |1329  1512 |   0  7934M| 641M 58.0M  626M 6469M|   0    61k   48k   39k
> > > > > 21-08 21:48:10| 93   6   0   0   0   0|   0  4984k|   0     0 |   0     0 |1160  1146 |   0  7934M| 572M 58.0M  628M 6536M|   0    50k   40k   57k
> > > > > 21-08 21:48:11| 86   9   6   0   0   0|   0  2520k|   0     0 |   0     0 |2947  4760 |   0  7934M| 605M 58.0M  631M 6500M|   0   103k   55k   45k
> > > > > 21-08 21:48:12| 90   8   2   0   0   0|   0  2840k|   0     0 |   0     0 |2674  4179 |   0  7934M| 671M 58.0M  635M 6431M|   0    84k   59k   42k
> > > > > 21-08 21:48:13| 90   9   1   0   0   0|   0  4656k|   0     0 |   0     0 |1223  1410 |   0  7934M| 643M 58.0M  638M 6455M|   0    90k   62k   68k
> > > > > 21-08 21:48:14| 91   8   1   0   0   0|   0  3572k|   0     0 |   0     0 |1432  1828 |   0  7934M| 647M 58.0M  641M 6447M|   0    81k   59k   57k
> > > > > 21-08 21:48:15| 91   8   1   0   0   0|   0  5116k| 116B    0 |   0     0 |1194  1295 |   0  7934M| 605M 58.0M  644M 6487M|   0    69k   54k   64k
> > > > > 21-08 21:48:16| 87  10   3   0   0   0|   0  5140k|   0     0 |   0     0 |1761  2586 |   0  7934M| 584M 58.0M  650M 6502M|   0   105k   64k   68k
> > > > > 
> > > > > The abnormal case compiled only 182 object file in 6 and a half minutes,
> > > > > then I stopped it. The same dstat output for this case:
> > > > > ----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----swap--- ------memory-usage----- -----virtual-memory----
> > > > >     time     |usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw | used  free| used  buff  cach  free|majpf minpf alloc  free
> > > > > 21-08 22:10:49| 27  62   0   0  11   0|   0     0 | 210B    0 |   0     0 |1414  3137k|   0  7934M| 531M 57.6M  595M 6611M|   0  1628  1250   322 
> > > > > 21-08 22:10:50| 25  60   4   0  11   0|   0    88k| 126B    0 |   0     0 |1337  3110k|   0  7934M| 531M 57.6M  595M 6611M|   0    91   128   115 
> > > > > 21-08 22:10:51| 26  63   0   0  11   0|   0   184k| 294B    0 |   0     0 |1411  3147k|   0  7934M| 531M 57.6M  595M 6611M|   0  1485   814   815 
> > > > > 21-08 22:10:52| 26  63   0   0  11   0|   0     0 | 437B  239B|   0     0 |1355  3160k|   0  7934M| 531M 57.6M  595M 6611M|   0    24    94    97 
> > > > > 21-08 22:10:53| 26  63   0   0  11   0|   0     0 | 168B    0 |   0     0 |1397  3155k|   0  7934M| 531M 57.6M  595M 6611M|   0   479   285   273 
> > > > > 21-08 22:10:54| 26  63   0   0  11   0|   0  4096B| 396B  324B|   0     0 |1346  3154k|   0  7934M| 531M 57.6M  595M 6611M|   0    27   145   145 
> > > > > 21-08 22:10:55| 26  63   0   0  11   0|   0    60k|   0     0 |   0     0 |1353  3148k|   0  7934M| 531M 57.6M  595M 6610M|   0    93   117    36 
> > > > > 21-08 22:10:56| 26  63   0   0  11   0|   0     0 |   0     0 |   0     0 |1341  3172k|   0  7934M| 531M 57.6M  595M 6610M|   0   158    87    74 
> > > > > 21-08 22:10:57| 26  62   1   0  11   0|   0     0 |  42B   60B|   0     0 |1332  3162k|   0  7934M| 531M 57.6M  595M 6610M|   0    56    82    78 
> > > > > 21-08 22:10:58| 26  63   0   0  11   0|   0     0 |   0     0 |   0     0 |1334  3178k|   0  7934M| 531M 57.6M  595M 6610M|   0    26    56    56 
> > > > > 21-08 22:10:59| 26  63   0   0  11   0|   0     0 |   0     0 |   0     0 |1336  3179k|   0  7934M| 531M 57.6M  595M 6610M|   0     3    33    32 
> > > > > 21-08 22:11:00| 26  63   0   0  11   0|   0    24k|  90B  108B|   0     0 |1347  3172k|   0  7934M| 531M 57.6M  595M 6610M|   0    41    73    71
> > > > > 
> > > > > I have four logical cores so 25% makes up 1 core. I don't know if the ~26% user CPU usage has anthing to do with this fact or just coincidence. The rest is ~63% system and ~11% hardware interrupt. Do these support what you suspect?
> > > > 
> > > > The massive increase in context switches does come as a bit of a surprise!
> > > > It does rule out my initial suspicion of lock contention, but then again
> > > > the fact that you have only four CPUs made that pretty unlikely to begin
> > > > with.
> > > > 
> > > > 2.4k average context switches in the good case for the full run vs. 3,156k
> > > > for about half of a run in the bad case. That is an increase of more
> > > > than three orders of magnitude!
> > > > 
> > > > Yow!!!
> > > > 
> > > > Page faults are actually -higher- in the good case. You have about 6.5GB
> > > > free in both cases, so you are not running out of memory. Lots more disk
> > > > writes in the good case, perhaps consistent with its getting more done.
> > > > Networking is negligible in both cases.
> > > > 
> > > > Lots of hardware interrupts in the bad case as well. Would you be willing
> > > > to take a look at /proc/interrupts before and after to see which one you
> > > > are getting hit with? (Or whatever interrupt tracking tool you prefer.)
> > > 
> > > Here are the results.
> > > 
> > > Good case before:
> > >           CPU0       CPU1       CPU2       CPU3       
> > >  0:         17          0          0          0   IO-APIC-edge      timer
> > >  1:        356          1         68          4   IO-APIC-edge      i8042
> > >  8:          0          0          1          0   IO-APIC-edge      rtc0
> > >  9:        330         14        449         71   IO-APIC-fasteoi   acpi
> > > 12:         10        108        269       2696   IO-APIC-edge      i8042
> > > 16:         36         10        111          2   IO-APIC-fasteoi   ehci_hcd:usb1
> > > 17:         20          3         25          4   IO-APIC-fasteoi   mmc0
> > > 21:          3          0         34          0   IO-APIC-fasteoi   ehci_hcd:usb2
> > > 40:          0          1         12         11   PCI-MSI-edge      mei_me
> > > 41:      10617        173       9959        292   PCI-MSI-edge      ahci
> > > 42:        862         11        186         26   PCI-MSI-edge      xhci_hcd
> > > 43:        107         77         27        102   PCI-MSI-edge      i915
> > > 44:       5322         20        434         22   PCI-MSI-edge      iwlwifi
> > > 45:        180          0        183         86   PCI-MSI-edge      snd_hda_intel
> > > 46:          0          3          0          0   PCI-MSI-edge      eth0
> > > NMI:          1          0          0          0   Non-maskable interrupts
> > > LOC:      16312      15177      10840       8995   Local timer interrupts
> > > SPU:          0          0          0          0   Spurious interrupts
> > > PMI:          1          0          0          0   Performance monitoring interrupts
> > > IWI:       1160        523       1031        481   IRQ work interrupts
> > > RTR:          3          0          0          0   APIC ICR read retries
> > > RES:      14976      16135       9973      10784   Rescheduling interrupts
> > > CAL:        482        457        151        370   Function call interrupts
> > > TLB:         70        106        352        230   TLB shootdowns
> > > TRM:          0          0          0          0   Thermal event interrupts
> > > THR:          0          0          0          0   Threshold APIC interrupts
> > > MCE:          0          0          0          0   Machine check exceptions
> > > MCP:          2          2          2          2   Machine check polls
> > > ERR:          0
> > > MIS:          0
> > > 
> > > Good case after:
> > >           CPU0       CPU1       CPU2       CPU3       
> > >  0:         17          0          0          0   IO-APIC-edge      timer
> > >  1:        367          1         81          4   IO-APIC-edge      i8042
> > >  8:          0          0          1          0   IO-APIC-edge      rtc0
> > >  9:        478         14        460         71   IO-APIC-fasteoi   acpi
> > > 12:         10        108        269       2696   IO-APIC-edge      i8042
> > > 16:         36         10        111          2   IO-APIC-fasteoi   ehci_hcd:usb1
> > > 17:         20          3         25          4   IO-APIC-fasteoi   mmc0
> > > 21:          3          0         34          0   IO-APIC-fasteoi   ehci_hcd:usb2
> > > 40:          0          1         12         11   PCI-MSI-edge      mei_me
> > > 41:      16888        173       9959        292   PCI-MSI-edge      ahci
> > > 42:       1102         11        186         26   PCI-MSI-edge      xhci_hcd
> > > 43:        107        132         27        136   PCI-MSI-edge      i915
> > > 44:       6943         20        434         22   PCI-MSI-edge      iwlwifi
> > > 45:        180          0        183         86   PCI-MSI-edge      snd_hda_intel
> > > 46:          0          3          0          0   PCI-MSI-edge      eth0
> > > NMI:          4          3          3          3   Non-maskable interrupts
> > > LOC:      26845      24780      19025      17746   Local timer interrupts
> > > SPU:          0          0          0          0   Spurious interrupts
> > > PMI:          4          3          3          3   Performance monitoring interrupts
> > > IWI:       1637        751       1287        695   IRQ work interrupts
> > > RTR:          3          0          0          0   APIC ICR read retries
> > > RES:      26511      26673      18791      20194   Rescheduling interrupts
> > > CAL:        510        480        151        370   Function call interrupts
> > > TLB:        361        292        575        461   TLB shootdowns
> > > TRM:          0          0          0          0   Thermal event interrupts
> > > THR:          0          0          0          0   Threshold APIC interrupts
> > > MCE:          0          0          0          0   Machine check exceptions
> > > MCP:          2          2          2          2   Machine check polls
> > > ERR:          0
> > > MIS:          0
> > > 
> > > Bad case before:
> > >           CPU0       CPU1       CPU2       CPU3       
> > >  0:         17          0          0          0   IO-APIC-edge      timer
> > >  1:        172          3         78          3   IO-APIC-edge      i8042
> > >  8:          0          1          0          0   IO-APIC-edge      rtc0
> > >  9:       1200        148        395         81   IO-APIC-fasteoi   acpi
> > > 12:       1625          2        348         10   IO-APIC-edge      i8042
> > > 16:         26         23        115          0   IO-APIC-fasteoi   ehci_hcd:usb1
> > > 17:         16          3         12         21   IO-APIC-fasteoi   mmc0
> > > 21:          2          2         33          0   IO-APIC-fasteoi   ehci_hcd:usb2
> > > 40:          0          0         14         10   PCI-MSI-edge      mei_me
> > > 41:      15776        374       8497        687   PCI-MSI-edge      ahci
> > > 42:       1297        829        115         24   PCI-MSI-edge      xhci_hcd
> > > 43:        103        149          9        212   PCI-MSI-edge      i915
> > > 44:      13151        101        511         91   PCI-MSI-edge      iwlwifi
> > > 45:        153        159          0        122   PCI-MSI-edge      snd_hda_intel
> > > 46:          0          1          1          0   PCI-MSI-edge      eth0
> > > NMI:         32         31         31         31   Non-maskable interrupts
> > > LOC:      82504      82732      74172      75985   Local timer interrupts
> > > SPU:          0          0          0          0   Spurious interrupts
> > > PMI:         32         31         31         31   Performance monitoring interrupts
> > > IWI:      17816      16278      13833      13282   IRQ work interrupts
> > > RTR:          3          0          0          0   APIC ICR read retries
> > > RES:      18784      21084      13313      12946   Rescheduling interrupts
> > > CAL:        393        422        306        356   Function call interrupts
> > > TLB:        231        176        235        191   TLB shootdowns
> > > TRM:          0          0          0          0   Thermal event interrupts
> > > THR:          0          0          0          0   Threshold APIC interrupts
> > > MCE:          0          0          0          0   Machine check exceptions
> > > MCP:          3          3          3          3   Machine check polls
> > > ERR:          0
> > > MIS:          0
> > > 
> > > Bad case after:
> > >           CPU0       CPU1       CPU2       CPU3       
> > >  0:         17          0          0          0   IO-APIC-edge      timer
> > >  1:        415          3         85          3   IO-APIC-edge      i8042
> > >  8:          0          1          0          0   IO-APIC-edge      rtc0
> > >  9:       1277        148        428         81   IO-APIC-fasteoi   acpi
> > > 12:       1625          2        348         10   IO-APIC-edge      i8042
> > > 16:         26         23        115          0   IO-APIC-fasteoi   ehci_hcd:usb1
> > > 17:         16          3         12         21   IO-APIC-fasteoi   mmc0
> > > 21:          2          2         33          0   IO-APIC-fasteoi   ehci_hcd:usb2
> > > 40:          0          0         14         10   PCI-MSI-edge      mei_me
> > > 41:      17814        374       8497        687   PCI-MSI-edge      ahci
> > > 42:       1567        829        115         24   PCI-MSI-edge      xhci_hcd
> > > 43:        103        177          9        242   PCI-MSI-edge      i915
> > > 44:      14956        101        511         91   PCI-MSI-edge      iwlwifi
> > > 45:        153        159          0        122   PCI-MSI-edge      snd_hda_intel
> > > 46:          0          1          1          0   PCI-MSI-edge      eth0
> > > NMI:         36         35         34         34   Non-maskable interrupts
> > > LOC:      92429      92708      81714      84071   Local timer interrupts
> > > SPU:          0          0          0          0   Spurious interrupts
> > > PMI:         36         35         34         34   Performance monitoring interrupts
> > > IWI:      22594      19658      17439      14257   IRQ work interrupts
> > > RTR:          3          0          0          0   APIC ICR read retries
> > > RES:      21491      24670      14618      14569   Rescheduling interrupts
> > > CAL:        441        439        306        356   Function call interrupts
> > > TLB:        232        181        274        465   TLB shootdowns
> > > TRM:          0          0          0          0   Thermal event interrupts
> > > THR:          0          0          0          0   Threshold APIC interrupts
> > > MCE:          0          0          0          0   Machine check exceptions
> > > MCP:          3          3          3          3   Machine check polls
> > > ERR:          0
> > > MIS:          0
> > 
> > Lots more local timer interrupts, which is consistent with the higher
> > time in interrupt handlers for the bad case.
> > 
> > > > One hypothesis is that your workload and configuration are interacting
> > > > with RCU_FAST_NO_HZ to force very large numbers of RCU grace periods.
> > > > Could you please check for this by building with CONFIG_RCU_TRACE=y,
> > > > mounting debugfs somewhere and dumping rcu/rcu_sched/rcugp before and
> > > > after each run?
> > > 
> > > Good case before:
> > > completed=8756  gpnum=8757  age=0  max=21
> > > after:
> > > completed=14686  gpnum=14687  age=0  max=21
> > > 
> > > Bad case before:
> > > completed=22970  gpnum=22971  age=0  max=21
> > > after:
> > > completed=26110  gpnum=26111  age=0  max=21
> > 
> > In the good case, (14686-8756)/40=148.25 grace periods per second, which
> > is a fast but reasonable rate given your HZ=250. Not a large difference
> > in the number of grace periods, but extrapolating for the longer runtime,
> > maybe ten times as much. But not much change in grace-period rate per
> > unit time.
> > 
> > > The test scenario was the following in both cases (mixed english and pseudo-bash):
> > > reboot, login, start terminal
> > > cd project
> > > rm -r build
> > > cat /proc/interrupts >> somefile ; cat /sys/kernel/debug/rcu/rcu_sched/rcugp >> somefile
> > > scons -j4
> > > wait ~40 sec (good case finished, Ctrl-C in bad case)
> > > cat /proc/interrupts >> somefile ; cat /sys/kernel/debug/rcu/rcu_sched/rcugp >> somefile
> > > 
> > > I stopped the build in the bad case after about the same time the good
> > > case finished, so the extra interrupts and RCU grace periods because of the
> > > longer runtime don't fake the results.
> > 
> > That procedure works for me, thank you for laying it out carefully.
> > 
> > I believe I see what is going on and how to fix it, though it may take
> > me a bit to work things through and get a good patch.
> > 
> > Thank you very much for your testing efforts!
> 
> I'm glad I can help. I've been using Linux for many years, now I have a
> chance to help the community, to do something in return. I'm quite
> enjoying this :)

;-)

Here is a patch that is more likely to help.  I am testing it in parallel,
but figured I should send you a sneak preview.

								Thanx, Paul

------------------------------------------------------------------------

rcu: Throttle rcu_try_advance_all_cbs() execution

The rcu_try_advance_all_cbs() function is invoked on each attempted
entry to and every exit from idle.  If this function determines that
there are callbacks ready to invoke, the caller will invoke the RCU
core, which in turn will result in a pair of context switches.  If a
CPU enters and exits idle extremely frequently, this can result in
an excessive number of context switches and high CPU overhead.

This commit therefore causes rcu_try_advance_all_cbs() to throttle
itself, refusing to do work more than once per jiffy.

Reported-by: Tibor Billes <tbilles@....com>
Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>

diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 5f97eab..52be957 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -104,6 +104,8 @@ struct rcu_dynticks {
 				    /* idle-period nonlazy_posted snapshot. */
 	unsigned long last_accelerate;
 				    /* Last jiffy CBs were accelerated. */
+	unsigned long last_advance_all;
+				    /* Last jiffy CBs were all advanced. */
 	int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */
 #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
 };
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index a538e73..2205751 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -1630,17 +1630,23 @@ module_param(rcu_idle_lazy_gp_delay, int, 0644);
 extern int tick_nohz_enabled;
 
 /*
- * Try to advance callbacks for all flavors of RCU on the current CPU.
- * Afterwards, if there are any callbacks ready for immediate invocation,
- * return true.
+ * Try to advance callbacks for all flavors of RCU on the current CPU, but
+ * only if it has been awhile since the last time we did so.  Afterwards,
+ * if there are any callbacks ready for immediate invocation, return true.
  */
 static bool rcu_try_advance_all_cbs(void)
 {
 	bool cbs_ready = false;
 	struct rcu_data *rdp;
+	struct rcu_dynticks *rdtp = &__get_cpu_var(rcu_dynticks);
 	struct rcu_node *rnp;
 	struct rcu_state *rsp;
 
+	/* Exit early if we advanced recently. */
+	if (jiffies == rdtp->last_advance_all)
+		return 0;
+	rdtp->last_advance_all = jiffies;
+
 	for_each_rcu_flavor(rsp) {
 		rdp = this_cpu_ptr(rsp->rda);
 		rnp = rdp->mynode;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/