[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <17542.1181753455@turing-police.cc.vt.edu>
Date: Wed, 13 Jun 2007 12:50:55 -0400
From: Valdis.Kletnieks@...edu
To: holzheu@...ux.vnet.ibm.com
Cc: linux-kernel@...r.kernel.org, randy.dunlap@...cle.com,
akpm@...l.org, gregkh@...e.de, mtk-manpages@....net,
schwidefsky@...ibm.com, heiko.carstens@...ibm.com
Subject: Re: [RFC/PATCH] Documentation of kernel messages
On Wed, 13 Jun 2007 17:06:57 +0200, holzheu said:
> They are used to that, because all other operating systems on that
> platform like z/OS, z/VM or z/VSE have message catalogs with detailed
> descriptions about the semantics of the messages.
25 years ago, I did OS/MVT and OS/VS1 for a living, so I know *all* about
the infamous "What does IEF507E mean again?"...
> In general we think, that also for Linux it is a good thing to have
> documentation for the most important kernel/driver messages. Even
> kernel hackers not always are aware of the meaning of kernel messages
> for components, which they don't know in detail. Most of the messages
> are self explaining but sometimes you get something like "Clocksource
> tsc unstable (delta = 7304132729 ns)" and you wonder if your system is
> going to explode.
This is probably best addressed by cleaning up the actual messages so they're
a bit more informative.
> New macros KMSG_ERR(), KMSG_WARN(), etc. are defined, which have to be
> used in printk. These macros have as parameter the message number and
> are using a per c-file defined macro KMSG_COMPONENT.
Gaak. *NO*.
The *only* reason that the MVS and VM message catalogs worked at all is
because each component had a message repository that went across *all* the
source files - the instant you saw IEFnnns, you knew that IEF covered the
job scheduler, nnn was a *unique* number, and s was a Severe/Warning/Info
flag. IGG was always data management, and so on. This breaks horribly if
you have 2 C files that define subtly different KMSG_COMPONENT values (or
even worse, 2 or more duplicates).
[/usr/src/linux-2.6.22-rc4-mm2] find . -name '*.c' | wc -l
9959
[/usr/src/linux-2.6.22-rc4-mm2] find . -name '*.h' | wc -l
9933
[/usr/src/linux-2.6.22-rc4-mm2] find . -type d | wc -l
1736
You plan to maintain message uniqueness how?
[/usr/src/linux-2.6.22-rc4-mm2]1 find . -name '*.c' | sed -r 's?.*/([^/]*)?\1?' | sort | uniq -c | sort -nr | head
105 setup.c
90 irq.c
66 time.c
58 init.c
50 inode.c
39 io.c
38 pci.c
37 file.c
32 signal.c
32 ptrace.c
Looks like you're going to have to embed a lot of the path in that KMSG_COMPONENT
to make it unique - and you want to keep that message under 80 or so chars total.
> /**
> * message
> * @0: device number of device.
> *
> * Description:
> * An operation has been performed on the msgtest device, but the
> * device has not been set online. Therefore the operation failed
If you don't understand 'Device /dev/foo offline', this description
doesn't help any. And that's true for *most* of the kernel messages
already - if you don't understand the message already, a paragraph
explanation isn't going to help much. Consider the average OOPS
message, which contains stuff like 'EIP=0x..'. Telling the user that
EIP means Execution Instruction Pointer isn't likely to help - if they
knew what the pointer *did*, they'd probably already know EIP.
> *
> * User Response:
> * Operator should set device online.
> * Issue "chccwdev -e <device number>".
And this is where the weakness of this scheme *really* hits. I've actually run
into cases where an operator followed the listed "Operator Response" for a
"device offline", and issued a 'VARY 0C0,ONLINE'. And then we got a flood of
I/O errors because the previous shift downed the device because it was having
issues. The response the operator *should* have done is "assign a different
tape drive, like, oh maybe the operational ones at 0C1 through 0C4"...
And it's the same here - if you get a message that /dev/sdb1 has no media
present, there's a good chance that you typo'ed, and meant /dev/sda1 or /dev/sdc1
So following the directions for 'sdb1 offline' and putting in a blank DVD
because sdb is the DVD burner won't fix things if what you were trying to do is
mkfs something on another disk... ;)
And while we're at it, I'll point out that any attempt to "fix" the kernel
messages on this scale had *better* solve all the I18N problems while we're
there....
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists