[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ebc0923a-48c1-ccd4-6b89-c4ba9ac48da4@roeck-us.net>
Date: Wed, 6 Apr 2022 09:52:05 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: Sebastian Ene <sebastianene@...gle.com>
Cc: Wim Van Sebroeck <wim@...ux-watchdog.org>,
Rob Herring <robh+dt@...nel.org>, devicetree@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-watchdog@...r.kernel.org,
will@...nel.org, qperret@...gle.com, maz@...nel.org
Subject: Re: [PATCH 2/2] watchdog: Add a mechanism to detect stalls on guest
vCPUs
On 4/6/22 09:31, Sebastian Ene wrote:
> On Tue, Apr 05, 2022 at 02:15:51PM -0700, Guenter Roeck wrote:
>> Sebastian,
>>
>
> Hello Guenter,
>
>> On Tue, Apr 05, 2022 at 02:19:55PM +0000, Sebastian Ene wrote:
>>> This patch adds support for a virtual watchdog which relies on the
>>> per-cpu hrtimers to pet at regular intervals.
>>>
>>
>> The watchdog subsystem is not intended to detect soft and hard lockups.
>> It is intended to detect userspace issues. A watchdog driver requires
>> a userspace compinent which needs to ping the watchdog on a regular basis
>> to prevent timeouts (and watchdog drivers are supposed to use the
>> watchdog kernel API).
>>
>
> Thanks for getting back ! I wanted to create a mechanism to detect
> stalls on vCPUs and I am not sure if the current watchdog subsystem has a way
> to create per-CPU binded watchdogs (in the same way as Power PC has
> kernel/watchdog.c).
> The per-CPU watchdog is needed to account for time that the guest is not
> running(either scheduled out or waiting for an event) to prevent spurious
> reset events caused by the watchdog.
>
>> What you have here is a CPU stall detection mechanism, similar to the
>> existing soft/hard lockup detection mechanism. This code does not
>> belong into the watchdog subsystem; it is similar to the existing
>> hard/softlockup detection code (kernel/watchdog.c) and should reside
>> at the same location.
>>
>
> I agree that this doesn't belong to the watchdog subsytem but the current
> stall detection mechanism calls through MMIO into a virtual device
> 'qemu,virt-watchdog'. Calling a device from (kernel/watchdog.c) isn't
> something that we should avoid ?
>
You are introducing qemu,virt-watchdog, so it seems to me that any argument
along that line doesn't really apply.
I think it is more a matter for core kernel developers to discuss and
decide how this functionality is best instantiated. It doesn't _have_
to be a device, after all, just like the current lockup detection
code is not a device. Either case, I am not really the right person
to discuss this since it is a matter of core kernel code which I am
not sufficiently familiar with. All I can say is that watchdog drivers
in the watchdog subsystem have a different scope.
Guenter
>> Having said that, I could imagine a watchdog driver to be used in VMs,
>> but that would be similar to existing watchdog drivers. The easiest way
>> to get there would probably be to just instantiate one of the watchdog
>> devices already supported by qemu.
>>
>
> I am looking forward for your response,
>
>> Guenter
>
> Cheers,
> Sebastian
Powered by blists - more mailing lists