linux-kernel - Re: [PATCH 2/2] watchdog: Add a mechanism to detect stalls on guest vCPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Yk3ARqLLPssVIM2/@google.com>
Date:   Wed, 6 Apr 2022 16:31:02 +0000
From:   Sebastian Ene <sebastianene@...gle.com>
To:     Guenter Roeck <linux@...ck-us.net>
Cc:     Wim Van Sebroeck <wim@...ux-watchdog.org>,
        Rob Herring <robh+dt@...nel.org>, devicetree@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-watchdog@...r.kernel.org,
        will@...nel.org, qperret@...gle.com, maz@...nel.org
Subject: Re: [PATCH 2/2] watchdog: Add a mechanism to detect stalls on guest
 vCPUs

On Tue, Apr 05, 2022 at 02:15:51PM -0700, Guenter Roeck wrote:
> Sebastian,
> 

Hello Guenter,

> On Tue, Apr 05, 2022 at 02:19:55PM +0000, Sebastian Ene wrote:
> > This patch adds support for a virtual watchdog which relies on the
> > per-cpu hrtimers to pet at regular intervals.
> > 
> 
> The watchdog subsystem is not intended to detect soft and hard lockups.
> It is intended to detect userspace issues. A watchdog driver requires
> a userspace compinent which needs to ping the watchdog on a regular basis
> to prevent timeouts (and watchdog drivers are supposed to use the
> watchdog kernel API).
> 

Thanks for getting back ! I wanted to create a mechanism to detect
stalls on vCPUs and I am not sure if the current watchdog subsystem has a way
to create per-CPU binded watchdogs (in the same way as Power PC has
kernel/watchdog.c). 
The per-CPU watchdog is needed to account for time that the guest is not
running(either scheduled out or waiting for an event) to prevent spurious
reset events caused by the watchdog.

> What you have here is a CPU stall detection mechanism, similar to the
> existing soft/hard lockup detection mechanism. This code does not
> belong into the watchdog subsystem; it is similar to the existing
> hard/softlockup detection code (kernel/watchdog.c) and should reside
> at the same location.
>

I agree that this doesn't belong to the watchdog subsytem but the current
stall detection mechanism calls through MMIO into a virtual device
'qemu,virt-watchdog'. Calling a device from (kernel/watchdog.c) isn't
something that we should avoid ?

> Having said that, I could imagine a watchdog driver to be used in VMs,
> but that would be similar to existing watchdog drivers. The easiest way
> to get there would probably be to just instantiate one of the watchdog
> devices already supported by qemu.
>

I am looking forward for your response,

> Guenter

Cheers,
Sebastian