linux-kernel - Re: [PATCH 2/2] watchdog: Add a mechanism to detect stalls on guest vCPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ebc0923a-48c1-ccd4-6b89-c4ba9ac48da4@roeck-us.net>
Date:   Wed, 6 Apr 2022 09:52:05 -0700
From:   Guenter Roeck <linux@...ck-us.net>
To:     Sebastian Ene <sebastianene@...gle.com>
Cc:     Wim Van Sebroeck <wim@...ux-watchdog.org>,
        Rob Herring <robh+dt@...nel.org>, devicetree@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-watchdog@...r.kernel.org,
        will@...nel.org, qperret@...gle.com, maz@...nel.org
Subject: Re: [PATCH 2/2] watchdog: Add a mechanism to detect stalls on guest
 vCPUs

On 4/6/22 09:31, Sebastian Ene wrote:
> On Tue, Apr 05, 2022 at 02:15:51PM -0700, Guenter Roeck wrote:
>> Sebastian,
>>
> 
> Hello Guenter,
> 
>> On Tue, Apr 05, 2022 at 02:19:55PM +0000, Sebastian Ene wrote:
>>> This patch adds support for a virtual watchdog which relies on the
>>> per-cpu hrtimers to pet at regular intervals.
>>>
>>
>> The watchdog subsystem is not intended to detect soft and hard lockups.
>> It is intended to detect userspace issues. A watchdog driver requires
>> a userspace compinent which needs to ping the watchdog on a regular basis
>> to prevent timeouts (and watchdog drivers are supposed to use the
>> watchdog kernel API).
>>
> 
> Thanks for getting back ! I wanted to create a mechanism to detect
> stalls on vCPUs and I am not sure if the current watchdog subsystem has a way
> to create per-CPU binded watchdogs (in the same way as Power PC has
> kernel/watchdog.c).
> The per-CPU watchdog is needed to account for time that the guest is not
> running(either scheduled out or waiting for an event) to prevent spurious
> reset events caused by the watchdog.
> 
>> What you have here is a CPU stall detection mechanism, similar to the
>> existing soft/hard lockup detection mechanism. This code does not
>> belong into the watchdog subsystem; it is similar to the existing
>> hard/softlockup detection code (kernel/watchdog.c) and should reside
>> at the same location.
>>
> 
> I agree that this doesn't belong to the watchdog subsytem but the current
> stall detection mechanism calls through MMIO into a virtual device
> 'qemu,virt-watchdog'. Calling a device from (kernel/watchdog.c) isn't
> something that we should avoid ?
> 

You are introducing qemu,virt-watchdog, so it seems to me that any argument
along that line doesn't really apply.

I think it is more a matter for core kernel developers to discuss and
decide how this functionality is best instantiated. It doesn't _have_
to be a device, after all, just like the current lockup detection
code is not a device. Either case, I am not really the right person
to discuss this since it is a matter of core kernel code which I am
not sufficiently familiar with. All I can say is that watchdog drivers
in the watchdog subsystem have a different scope.

Guenter

>> Having said that, I could imagine a watchdog driver to be used in VMs,
>> but that would be similar to existing watchdog drivers. The easiest way
>> to get there would probably be to just instantiate one of the watchdog
>> devices already supported by qemu.
>>
> 
> I am looking forward for your response,
> 
>> Guenter
> 
> Cheers,
> Sebastian