[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1292366864.3446.875.camel@calx>
Date: Tue, 14 Dec 2010 16:47:44 -0600
From: Matt Mackall <mpm@...enic.com>
To: Mike Waychison <mikew@...gle.com>
Cc: simon.kagstrom@...insight.net, davem@...emloft.net,
nhorman@...driver.com, adurbin@...gle.com,
linux-kernel@...r.kernel.org, chavey@...gle.com,
Greg KH <greg@...ah.com>, netdev@...r.kernel.org,
Américo Wang <xiyou.wangcong@...il.com>,
akpm@...ux-foundation.org, linux-api@...r.kernel.org
Subject: Re: [PATCH v3 21/22] netoops: Add user-programmable boot_id
On Tue, 2010-12-14 at 14:33 -0800, Mike Waychison wrote:
> On Tue, Dec 14, 2010 at 2:06 PM, Matt Mackall <mpm@...enic.com> wrote:
> > On Tue, 2010-12-14 at 13:59 -0800, Mike Waychison wrote:
> >> On Tue, Dec 14, 2010 at 1:42 PM, Matt Mackall <mpm@...enic.com> wrote:
> >> > On Tue, 2010-12-14 at 13:30 -0800, Mike Waychison wrote:
> >> >> Add support for letting userland define a 32bit boot id. This is useful
> >> >> for users to be able to correlate netoops reports to specific boot
> >> >> instances offline.
> >> >
> >> > This sounds a lot like the pre-existing /proc/sys/kernel/random/boot_id
> >> > that's used by kerneloops.org.
> >>
> >> Could be. I'm looking at it now... There is no documentation for this
> >> boot_id field?
> >
> > Probably not. It's just a random number generated at boot.
> >
> >> Reusing this guy would work, except that it doesn't appear to allow
> >> arbitrary values to be set. We need to inject our boot sequence
> >> number (which is figured out in userland) in the packet somehow as we
> >> need to correlate it to our other monitoring systems.
> >
> > What happens if you oops before userspace is available?
> >
>
> Either one of two general cases:
> - The crash is a one-off and the machine comes back. The boot
> number sequence will see a hole in it, which is a clue that something
> bad happened.
> - The machine is in a crash loop. This has the same failure mode
> for us as if the machine never made it onto the network due to
> whatever reason: bad cables, bad firmware, bad ram, ...
>
> In both cases, we can detect that something is wrong and handle it.
> Note that our firmware is responsible for incrementing the boot
> sequence at bootup, which is why the above works. In general though,
> our machines do make it up to userland -- staying alive once booted is
> the hard part ;)
Interesting. Is this Google-specific firmware magic? I'd probably accept
a hook in random.c to fold a number into the UUID, which would unify
things.
--
Mathematics is the supreme nostalgia of our time.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists