[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <496F8421.3070703@novell.com>
Date: Thu, 15 Jan 2009 13:44:49 -0500
From: Gregory Haskins <ghaskins@...ell.com>
To: Steven Rostedt <srostedt@...hat.com>
CC: Matthew Wilcox <matthew@....cx>,
Andrew Morton <akpm@...ux-foundation.org>,
James Bottomley <James.Bottomley@...senPartnership.com>,
"Wilcox, Matthew R" <matthew.r.wilcox@...el.com>,
chinang.ma@...el.com, linux-kernel@...r.kernel.org,
sharad.c.tripathi@...el.com, arjan@...ux.intel.com,
andi.kleen@...el.com, suresh.b.siddha@...el.com,
harita.chilukuri@...el.com, douglas.w.styner@...el.com,
peter.xihong.wang@...el.com, hubert.nueckel@...el.com,
chris.mason@...cle.com, linux-scsi@...r.kernel.org,
Andrew Vasquez <andrew.vasquez@...gic.com>,
Anirban Chakraborty <anirban.chakraborty@...gic.com>
Subject: Re: Mainline kernel OLTP performance update
Steven Rostedt wrote:
> On Thu, 2009-01-15 at 11:00 -0700, Matthew Wilcox wrote:
>
>> On Thu, Jan 15, 2009 at 09:44:42AM -0800, Andrew Morton wrote:
>>
>>>> Me too. Anecdotally, I haven't noticed this in my lab machines, but
>>>> what I have noticed is on someone else's laptop (a hyperthreaded atom)
>>>> that I was trying to demo powertop on was that IPI reschedule interrupts
>>>> seem to be out of control ... they were ticking over at a really high
>>>> rate and preventing the CPU from spending much time in the low C and P
>>>> states. To me this implicates some scheduler problem since that's the
>>>> primary producer of IPI reschedules ... I think it wouldn't be a
>>>> significant extrapolation to predict that the scheduler might be the
>>>> cause of the above problem as well.
>>>>
>>>>
>>> Good point.
>>>
>>> The context switch rate actually went down a bit.
>>>
>>> I wonder if the Intel test people have records of /proc/interrupts for
>>> the various kernel versions.
>>>
>> I think Chinang does, but he's out of office today. He did say in an
>> earlier reply:
>>
>>
>>> I took a quick look at the interrupts figure between 2.6.24 and 2.6.27.
>>> i/o interuputs is slightly down in 2.6.27 (due to reduce throughput).
>>> But both NMI and reschedule interrupt increased. Reschedule interrupts
>>> is 2x of 2.6.24.
>>>
>> So if the reschedule interrupt is happening twice as often, and the
>> context switch rate is basically unchanged, I guess that means the
>> scheduler is doing a lot more work to get approximately the same
>> results. And that seems like a bad thing.
>>
I would be very interested in gathering some data in this area. One
thing that pops to mind is to instrument the resched-ipi with
ftrace_printk() and gather a trace of this system in action. I assume
that I wouldn't have access to this OLTP suite, so I may need a
volunteer to try this for me. I could put together an instrumentation
patch for the testers convenience if they prefer.
Another data-point I wouldn't mind seeing is looking at the scheduler
statistics, particularly with my sched-top utility, which you can find here:
http://rt.wiki.kernel.org/index.php/Schedtop_utility
(Note you may want to exclude the sched_info stats, as they are
inherently noisy and make it hard to see the real trends. To do this
run it with: 'schedtop -x "sched_info"'
In the meantime, I will try similar approaches here on other non-OLTP
based workloads to see if I spy anything that looks amiss.
-Greg
Download attachment "signature.asc" of type "application/pgp-signature" (258 bytes)
Powered by blists - more mailing lists