[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTineeFcSNx09P=2SZGcBHeq5p_LQ54nU=uGVv_Ck@mail.gmail.com>
Date: Tue, 14 Dec 2010 14:33:02 -0800
From: Mike Waychison <mikew@...gle.com>
To: Matt Mackall <mpm@...enic.com>
Cc: simon.kagstrom@...insight.net, davem@...emloft.net,
nhorman@...driver.com, adurbin@...gle.com,
linux-kernel@...r.kernel.org, chavey@...gle.com,
Greg KH <greg@...ah.com>, netdev@...r.kernel.org,
Américo Wang <xiyou.wangcong@...il.com>,
akpm@...ux-foundation.org, linux-api@...r.kernel.org
Subject: Re: [PATCH v3 21/22] netoops: Add user-programmable boot_id
On Tue, Dec 14, 2010 at 2:06 PM, Matt Mackall <mpm@...enic.com> wrote:
> On Tue, 2010-12-14 at 13:59 -0800, Mike Waychison wrote:
>> On Tue, Dec 14, 2010 at 1:42 PM, Matt Mackall <mpm@...enic.com> wrote:
>> > On Tue, 2010-12-14 at 13:30 -0800, Mike Waychison wrote:
>> >> Add support for letting userland define a 32bit boot id. This is useful
>> >> for users to be able to correlate netoops reports to specific boot
>> >> instances offline.
>> >
>> > This sounds a lot like the pre-existing /proc/sys/kernel/random/boot_id
>> > that's used by kerneloops.org.
>>
>> Could be. I'm looking at it now... There is no documentation for this
>> boot_id field?
>
> Probably not. It's just a random number generated at boot.
>
>> Reusing this guy would work, except that it doesn't appear to allow
>> arbitrary values to be set. We need to inject our boot sequence
>> number (which is figured out in userland) in the packet somehow as we
>> need to correlate it to our other monitoring systems.
>
> What happens if you oops before userspace is available?
>
Either one of two general cases:
- The crash is a one-off and the machine comes back. The boot
number sequence will see a hole in it, which is a clue that something
bad happened.
- The machine is in a crash loop. This has the same failure mode
for us as if the machine never made it onto the network due to
whatever reason: bad cables, bad firmware, bad ram, ...
In both cases, we can detect that something is wrong and handle it.
Note that our firmware is responsible for incrementing the boot
sequence at bootup, which is why the above works. In general though,
our machines do make it up to userland -- staying alive once booted is
the hard part ;)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists