[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4AE61D84.9000107@sgi.com>
Date: Mon, 26 Oct 2009 15:07:00 -0700
From: Mike Travis <travis@....com>
To: Andi Kleen <andi@...stfloor.org>
CC: Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Jack Steiner <steiner@....com>,
Randy Dunlap <rdunlap@...otime.net>,
Steven Rostedt <rostedt@...dmis.org>,
Greg Kroah-Hartman <gregkh@...e.de>,
Frederic Weisbecker <fweisbec@...il.com>,
Heiko Carstens <heiko.carstens@...ibm.com>,
Robin Getz <rgetz@...log.com>,
Dave Young <hidave.darkstar@...il.com>,
linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org
Subject: Re: [PATCH 1/8] SGI x86_64 UV: Add limit console output function
Andi Kleen wrote:
> On Mon, Oct 26, 2009 at 11:03:59AM -0700, Mike Travis wrote:
>>
>> Andi Kleen wrote:
>>> Mike Travis <travis@....com> writes:
>>>
>>>> With a large number of processors in a system there is an excessive amount
>>>> of messages sent to the system console. It's estimated that with 4096
>>>> processors in a system, and the console baudrate set to 56K, the startup
>>>> messages will take about 84 minutes to clear the serial port.
>>>>
>>>> This patch adds (for SGI UV only) a kernel start option "limit_console_
>>>> output" (or 'lco' for short), which when set provides the ability to
>>>> temporarily reduce the console loglevel during system startup. This allows
>>>> informative messages to still be seen on the console without producing
>>>> excessive amounts of repetious messages.
>>>>
>>>> Note that all the messages are still available in the kernel log buffer.
>>> I've run into the same problem (kernel log being flooded on large number of CPU thread
>>> systems). It's definitely not a UV only problem. Making such a option UV only
>>> is definitely not the right approach, if anything it needs to be for everyone.
>> I could use something like the MAXSMP config option to enable it...?
>
> No, it's a problem long before MAXSMP sizes.
>
>>> Frankly a lot of these messages made sense for debugging at some point,
>>> but really don't anymore and should just be removed.
>> That they still go to the kernel log buffer means the messages are still
>> available for debugging system problems. KDB has a kernel print option if
>> you end up there before being able to use 'dmesg'.
>
> Again they should be just reevaluated and pr_debug()ed or completely
> removed.
>
>>> Also I don't like the defaults of on. It would be better to evaluate if
>>> these various messages are really useful and if they are not just remove them.
>> I believe most distros already do that by setting the loglevel argument
>> (but I could be wrong since I haven't looked at too many of them.)
>
> Even spamming dmesg is a problem. loglevel doesn't fix that.
>
>>> For example do we really need the scheduler debug messages by default?
>> This was the most painful message at Nasa (which has a 2k cpu system). It took
>> well over an hour for these scheduler messages to print, just because we wanted
>> to get some other DEBUG prints.
>
> They should be just removed.
I had changed this to CONFIG_DEBUG_SCHED at one time. Perhaps this would be
acceptible?
>
>>> Or do we really need to print the caches for each CPU at boot? The information
>>> is in sysfs anyways and rarely changes (I added this originally on 64bit,
>>> but in hindsight it was a bad idea)
>> I was attempting not to decide whether each message was pertinent, only if it
>> was redundant.
>
> You should decide or at least ask whoever added it
>
> ("How many bugs did you fix with that message last year?" If the answer
> is < 10 or so, remove it)
Ok.
>>> I don't think it makes much sense to print more than 2-3 lines for each CPU boot
>>> for example.
>> That would still be 4 to 12 thousand lines of information which, as you say is
>> available by other means.
>
> A simple checkpoint for debugging is not available by other means.
>
> The cache, mce etc. information is.
>
> For the checkpoint problem on CPU boot it might be reasonable
> to write them into a special buffer and only print it when the other
> CPU does not come up (BP detects a time out)
>
> With that a single line of per CPU output should be feasible without
> losing any debuggability.
>
> In fact debuggability could be improved by putting the output
> at better strategic points instead of the ad-hoc way it is currently.
>
> -Andi
>
Ok, thanks for the feedback. I'll see about reducing the output more
intelligently for CPU's (as per Ingo's suggestions as well.)
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists