[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130410164914.GA18946@redhat.com>
Date: Wed, 10 Apr 2013 12:49:14 -0400
From: David Teigland <teigland@...hat.com>
To: Don Zickus <dzickus@...hat.com>
Cc: Guenter Roeck <linux@...ck-us.net>, Dave Young <dyoung@...hat.com>,
linux-watchdog@...r.kernel.org, kexec@...ts.infradead.org,
wim@...ana.be, LKML <linux-kernel@...r.kernel.org>,
vgoyal@...hat.com
Subject: Re: [RFC PATCH] watchdog: Add hook for kicking in kdump path
On Wed, Apr 10, 2013 at 09:40:39AM -0400, Don Zickus wrote:
> However, we still have the problem that if the machine panics and we want
> to jump into the kdump kernel, we need to 'kick' the watchdog one more
> time. This provides us a sane sync point for determining how long we have
> to load the watchdog driver in the second kernel before the hardware
> reboots us. Otherwise the reboots are pretty random and nothing is
> guaranteed.
Some time ago I submitted this patch
http://www.spinics.net/lists/linux-watchdog/msg01477.html
to get rid of the one "extraneous" ping that was causing me trouble.
I'd still like to see merged, but haven't had time to follow up.
I have a use case where I need to guarantee that the watchdog
will *not* be pinged unless my userland daemon does the ping.
If my daemon is killed, the close() generates a ping that I
don't intend. This kdump ping looks like it would be another
instance that I'd need to suppress. Perhaps by renaming my flag
WDOG_NO_EXTRA_PING and checking it both in release and in
kick_for_kdump?
(My daemon associates watchdog pings with shared storage heartbeats.
Based on the heartbeats, hosts in a cluster can calculate when an
unresponsive host last pinged its watchdog, and can be fairly
certain that the "dead" host has been reset by its watchdog 60
seconds later. This is used as an alternative to i/o fencing
where we're protecting data on shared storage from corruption
after host failures. If there are uncontrolled watchdog pings,
then hosts don't know when a dead host might have last pinged
its watchdog, since it is no longer based on the last timestamp
it wrote to shared storage.)
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists