linux-kernel - Re: [PATCH 1/8] SGI x86_64 UV: Add limit console output function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4AE61D84.9000107@sgi.com>
Date:	Mon, 26 Oct 2009 15:07:00 -0700
From:	Mike Travis <travis@....com>
To:	Andi Kleen <andi@...stfloor.org>
CC:	Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jack Steiner <steiner@....com>,
	Randy Dunlap <rdunlap@...otime.net>,
	Steven Rostedt <rostedt@...dmis.org>,
	Greg Kroah-Hartman <gregkh@...e.de>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	Robin Getz <rgetz@...log.com>,
	Dave Young <hidave.darkstar@...il.com>,
	linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org
Subject: Re: [PATCH 1/8] SGI x86_64 UV: Add limit console output function



Andi Kleen wrote:
> On Mon, Oct 26, 2009 at 11:03:59AM -0700, Mike Travis wrote:
>>
>> Andi Kleen wrote:
>>> Mike Travis <travis@....com> writes:
>>>
>>>> With a large number of processors in a system there is an excessive amount
>>>> of messages sent to the system console.  It's estimated that with 4096
>>>> processors in a system, and the console baudrate set to 56K, the startup
>>>> messages will take about 84 minutes to clear the serial port.
>>>>
>>>> This patch adds (for SGI UV only) a kernel start option "limit_console_
>>>> output" (or 'lco' for short), which when set provides the ability to
>>>> temporarily reduce the console loglevel during system startup.  This allows
>>>> informative messages to still be seen on the console without producing
>>>> excessive amounts of repetious messages.
>>>>
>>>> Note that all the messages are still available in the kernel log buffer.
>>> I've run into the same problem (kernel log being flooded on large number of CPU thread
>>> systems). It's definitely not a UV only problem. Making such a option UV only
>>> is definitely not the right approach, if anything it needs to be for everyone.
>> I could use something like the MAXSMP config option to enable it...?
> 
> No, it's a problem long before MAXSMP sizes.
> 
>>> Frankly a lot of these messages made sense for debugging at some point,
>>> but really don't anymore and should just be removed.
>> That they still go to the kernel log buffer means the messages are still
>> available for debugging system problems.  KDB has a kernel print option if
>> you end up there before being able to use 'dmesg'.
> 
> Again they should be just reevaluated and pr_debug()ed or completely
> removed.
> 
>>> Also I don't like the defaults of on. It would be better to evaluate if
>>> these various messages are really useful and if they are not just remove them.
>> I believe most distros already do that by setting the loglevel argument
>> (but I could be wrong since I haven't looked at too many of them.)
> 
> Even spamming dmesg is a problem. loglevel doesn't fix that.
> 
>>> For example do we really need the scheduler debug messages by default?
>> This was the most painful message at Nasa (which has a 2k cpu system).  It took
>> well over an hour for these scheduler messages to print, just because we wanted
>> to get some other DEBUG prints.
> 
> They should be just removed.

I had changed this to CONFIG_DEBUG_SCHED at one time.  Perhaps this would be
acceptible?

> 
>>> Or do we really need to print the caches for each CPU at boot? The information
>>> is in sysfs anyways and rarely changes (I added this originally on 64bit,
>>> but in hindsight it was a bad idea)
>> I was attempting not to decide whether each message was pertinent, only if it
>> was redundant.
> 
> You should decide or at least ask whoever added it
> 
> ("How many bugs did you fix with that message last year?" If the answer
> is < 10 or so, remove it)

Ok.

>>> I don't think it makes much sense to print more than 2-3 lines for each CPU boot
>>> for example.
>> That would still be 4 to 12 thousand lines of information which, as you say is
>> available by other means.
> 
> A simple checkpoint for debugging is not available by other means.
> 
> The cache, mce etc. information is.
> 
> For the checkpoint problem on CPU boot it might be reasonable
> to write them into a special buffer and only print it when the other
> CPU does not come up (BP detects a time out)
> 
> With that a single line of per CPU output should be feasible without
> losing any debuggability.
> 
> In fact debuggability could be improved by putting the output 
> at better strategic points instead of the ad-hoc way it is currently.
> 
> -Andi
> 

Ok, thanks for the feedback.  I'll see about reducing the output more
intelligently for CPU's (as per Ingo's suggestions as well.)

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/