linux-kernel - Re: Dynticks Causing High Context Switch Rate in ksoftirqd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20071128112038.GG22039@elte.hu>
Date:	Wed, 28 Nov 2007 12:20:38 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	bdupree@...hfinesse.com, linux-kernel@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: Dynticks Causing High Context Switch Rate in ksoftirqd


* Andrew Morton <akpm@...ux-foundation.org> wrote:

> On Mon, 26 Nov 2007 20:36:32 -0600 (CST) bdupree@...hfinesse.com wrote:
> 
> > Question: Why is ksoftirqd eating about 5 to 10 percent of my CPU on an idle
> > system? The problem occurs if I config the kernel with tickless
> > support (i.e. CONFIG_TICK_ONESHOT=y).  (Thanks to "oprofile" for putting me
> > onto this.)
> 
> beware that oprofile can provide misleading results on a paritally-idle
> system.  You may have discovered that ksoftirqd is consuming 5-10% of the
> non-idle time on that idle system, which is less surprising.
> 
> > I have noted this same problem on kernel versions: 2.6.23.1, 2.6.23.8 and
> > 2.6.23.9
> > 
> > **************************************************************************
> > *** Output from "vmstat -n 1 10" -- Note very high context switch rate ***
> > *** This is on a idle machine!                                         ***
> > **************************************************************************
> > 
> > procs -----------memory---------- ---swap-- -----io---- --system--
> > ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy
> > id wa
> >  0  0      0 1925556   4768 116104    0    0   124     2    6  7538  1  2
> > 96  1
> >  0  0      0 1925556   4768 116104    0    0     0     0    2 147329  0  1
> > 99  0
> >  0  0      0 1925548   4768 116104    0    0     0     0    0 154515  0  1
> > 99  0
> >  0  0      0 1925548   4768 116104    0    0     0     0    1 153898  0  2
> > 98  0
> >  0  0      0 1925548   4780 116104    0    0     0    16    3 155216  0  1
> > 99  0
> >  0  0      0 1925548   4780 116104    0    0     0     0    1 161718  0  1
> > 99  0
> >  0  0      0 1925548   4780 116104    0    0     0     0    0 147587  0  2
> > 98  0
> >  0  0      0 1925548   4780 116104    0    0     0     0    1 153524  0  2
> > 98  0
> >  0  0      0 1925448   4780 116104    0    0     0     0    0 153434  0  1
> > 99  0
> >  0  0      0 1925448   4792 116092    0    0     0    16    4 153527  0  2
> > 98  0
> 
> So what piece of code is scheduling so much?  What does `top' say?  
> What does the (sorted) output of oprofile look like?
> 
> Did you try shutting down as much userspace code as possible to find 
> out if some userspace task is misbehaving?

such 'what the heck is happening' problems can also be debugged via the 
tracer. Here's a quickstart:

  http://redhat.com/~mingo/latency-tracing-patches/tracing-QuickStart.txt

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/