[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b3ece790901151539s1c470472x5255688477d54d7f@mail.gmail.com>
Date: Thu, 15 Jan 2009 15:39:07 -0800
From: Tim Hockin <thockin@...il.com>
To: Andi Kleen <ak@...ux.intel.com>
Cc: Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org, "H. Peter Anvin" <hpa@...or.com>,
ying.huang@...el.com, Aaron Durbin <adurbin@...il.com>,
priyankag@...gle.com
Subject: Re: x86/mce merge, integration hickup + crash, design thoughts
On Thu, Jan 15, 2009 at 2:56 PM, Andi Kleen <ak@...ux.intel.com> wrote:
> Tim Hockin wrote:
>>
>> Yeah, no offense, but that's horrible :)
>
> I'm not sure it's worse than the XML like format proposals that seem to
> get thrown around. That is I am the only one who mentioned the
> X word yet, but the structured ASCII records that have been hinted
> at would be exactly like that.
Yeah, if anyone proposes XML, I'll punch them :) I'm not that firm on
it being ASCII myself, but it would be OK. Google's protocol buffers
are a good example of a very efficient encoding of flexible data.
This is not an unsolvable problem.
>> SATA
>> disk timeouts.
>
> Now that's a different issue. Generalized driver error reporting for
> everyone.
>
> There was a lot of discussion some years ago from a IBM proposal to do
> in general structured error reporting. But that was quite unpopular
> and no-one really liked it.
>
> What came out of it was the dev_printk() stuff that allows
> to match error messages to devices. So you already have some
> baby steps in this direction.
I remember - I was at Sun at the time. We wanted the same thing.
> I suspect doing this fully generalized would be quite difficult
> because there would be so many people you have to convince.
Yes, it will be hard. But I think it will not be impossible.
>> Now I know there are different conduits for some events - netlink
>> tells me about netif link up/down events I think. I would settle for
>> a small number of interfaces. What I don't want is what we have today
>> - EVERYTHING has a different interface. Some are poll()-able. Some
>> have to be actively polled. Some have to have a daemon listening or
>> else messages are dropped.
>
> Well the kernel will always have limited buffers, so the someone
> needs to listen problem will be always there.
Sure, but just having a small buffer, like mcelog, and being
poll()able goes a log way.
>> Put it this way: Given a thousand machines, I want to gather,
>> collate, and correlate all these events. I want to be able to produce
>> a "life story" of sorts for a machine and for a data center. Once I
>> can do that, I can start to make predictive diagnoses more accurately,
>> and I can know how much these things actually COST us.
>
> Sure sounds nice. But frankly I don't see it happening. It would
> be just too radical a change of too much code.
We do a lot of this now, but it's far less accurate than it could be,
because we drop so many events. And programming to all the different
interfaces is difficult and error-prone.
I'm hoping we can find a solution that makes everyone only a little bit unhappy.
Tim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists