[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48E3BE2D.4020606@sgi.com>
Date: Wed, 01 Oct 2008 11:15:09 -0700
From: Mike Travis <travis@....com>
To: Pavel Machek <pavel@...e.cz>
CC: Ingo Molnar <mingo@...e.hu>,
Andrew Morton <akpm@...ux-foundation.org>, rpurdie@...ys.net,
Jack Steiner <steiner@....com>, linux-kernel@...r.kernel.org,
Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH 1/1] SGI X86 UV: Provide a System Activity Indicator driver
Pavel Machek wrote:
>> Another relevant point is that I will be adding a bit more functionality
>> to disable the timer interrupt on truly "idle" cpus (like have been idle
>> for some amount of seconds). We would then use the "exit from idle"
>> callback to reestablish the timer interrupt. [This would allow them to
>> enter power down states if appropriate.]
>
> Should you look at nohz instead of reinventing it?
Thanks, I did look at it. Quite complex. Maybe I'm missing something
but I don't see how it fits in? Are you saying I should be using data
in the percpu tick_sched to gather the idle information for the once
per second per cpu status update interrupt? I see the @idle_active
entry but wouldn't this always be false during the timer interrupt?
Using any other entries would appear to be more complex than a simple
store byte and subtracting two longs.
Or perhaps I should somehow hook into the sched_timer interrupt instead
of having a separate once per second per cpu interrupt? (Does this
sched_timer interrupt each cpu once per second?)
>
>>> As i suggested in my previous mail about this topic, a low-frequency
>>> sampling method should be used instead, to indicate system status. I
>>> thought the leds drivers have all that in place already.
>> It is low frequency (once per second), this is just setting what's to
>> be sampled.
>>
>> As I mentioned, this is not for LED displays (human readable), it's for the
>> system controller to monitor how all parts of the system are running, and
>> this one is just the cpu parts. The LED driver approach would have me
>> registering 4096 led devices, with all their callbacks, 4096 strings saying
>> "LED0001", etc., and I still cannot associate a specific register bit
>> (AKA LED if that's what it was), with a specific cpu using the LED driver.
>>
>> The LED driver is fine for a couple of blinking lights indicating overall
>> system activity, disk activity, etc. (Btw, I did not see a network trigger,
>> or a paging trigger, or an out of memory trigger, or some other things that
>> might be useful for real time monitoring of the system.)
>
> ...so add them...
>
>> But the LED driver has way more overhead than needed for this simple application.
>>
>
> So overhead from led driver is not okay, while overhead from messing
> with idle loop is okay? Interesting...
> Pavel
The overhead is mainly the registration of descriptor blocks for the
4096 registers representing the 4096 cpus all at separate addresses.
The overhead in this patch for maintaining the "idle" state (prior to the
timer interrupt causing "exit_idle") is storing a byte and subtracting the
current jiffies from the jiffies at the last one second timer interrupt.
(Even this subtraction can be removed, the only *important* item is
whether the cpu is currently idle or not.)
This data is written to node local memory that's highly likely to be in
the cache, as the same memory block is used for all UV hub operations.
Unfortunately, I am experiencing a simulator problem at the moment or
I'd be able to quantify the exact amount of time added to the exit_idle()
function, but it's basically noise in the overall resumption of a thread.
One other factor, this overhead is *only* for UV systems, no other x86_64
systems or architectures are affected, so again I'm not understanding the
objection. This request was made from our hardware and RAS engineers,
and is identical to what's been in the ia64 kernel for a few years now.
Perhaps the confusion is it's near relationship to real "LED" lights?
The original name "LED" is historical. The bits are read by a system
controller that has the job of monitoring the entire system, including
both soft and hard errors and determining faulty [or near faulty]
system components. For example, if a node suddenly hangs, this is a
one of the diagnostic aids used in determining the state of that node.
(Btw, the SCIR register that is written to once per second is a FIFO so
it contains the last 64 updates of this register giving a temporal view
of each cpu as well.)
Thanks,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists