linux-kernel - Re: [PATCH v10 2/2] misc: Add a mechanism to detect stalls on guest vCPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Ysfoc2YhQCLge1iY@google.com>
Date:   Fri, 8 Jul 2022 08:18:59 +0000
From:   Sebastian Ene <sebastianene@...gle.com>
To:     Will Deacon <will@...nel.org>
Cc:     Rob Herring <robh+dt@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Arnd Bergmann <arnd@...db.de>,
        Dragan Cvetic <dragan.cvetic@...inx.com>,
        linux-kernel@...r.kernel.org, devicetree@...r.kernel.org,
        maz@...nel.org, vdonnefort@...gle.com,
        Guenter Roeck <linux@...ck-us.net>
Subject: Re: [PATCH v10 2/2] misc: Add a mechanism to detect stalls on guest
 vCPUs

On Thu, Jul 07, 2022 at 07:27:38PM +0100, Will Deacon wrote:
> Hi Sebastian,
> 
> On Thu, Jul 07, 2022 at 03:42:27PM +0000, Sebastian Ene wrote:

Hi Will,

> > This driver creates per-cpu hrtimers which are required to do the
> > periodic 'pet' operation. On a conventional watchdog-core driver, the
> > userspace is responsible for delivering the 'pet' events by writing to
> > the particular /dev/watchdogN node. In this case we require a strong
> > thread affinity to be able to account for lost time on a per vCPU.
> > 
> > This part of the driver is the 'frontend' which is reponsible for
> > delivering the periodic 'pet' events, configuring the virtual peripheral
> > and listening for cpu hotplug events. The other part of the driver is
> > an emulated MMIO device which is part of the KVM virtual machine
> > monitor and this part accounts for lost time by looking at the
> > /proc/{}/task/{}/stat entries.
> > 
> > Signed-off-by: Sebastian Ene <sebastianene@...gle.com>
> > ---
> >  drivers/misc/Kconfig               |  14 ++
> >  drivers/misc/Makefile              |   1 +
> >  drivers/misc/vcpu_stall_detector.c | 209 +++++++++++++++++++++++++++++
> >  3 files changed, 224 insertions(+)
> >  create mode 100644 drivers/misc/vcpu_stall_detector.c
> 
> Thanks for addressing all of my feedback on v9 so promptly:
> 
> Reviewed-by: Will Deacon <will@...nel.org>
> 
> Just one question on this part:
> 
> > +static enum hrtimer_restart
> > +vcpu_stall_detect_timer_fn(struct hrtimer *hrtimer)
> > +{
> > +	u32 ticks, ping_timeout_ms;
> > +
> > +	/* Reload the stall detector counter register every
> > +	 * `ping_timeout_ms` to prevent the virtual device
> > +	 * from decrementing it to 0. The virtual device decrements this
> > +	 * register at 'clock_freq_hz' frequency.
> > +	 */
> > +	ticks = vcpu_stall_config.clock_freq_hz *
> > +		vcpu_stall_config.stall_timeout_sec;
> 
> It would be quite easy for this to overflow 32 bits, so perhaps it would
> be best to check the values from the DT during probe and fallback to the
> defaults (with a warning) if the result of the multiplication is out
> of range for the 32-bit register.
> 
> What do you think? My review stands in any case, as this shouldn't happen
> in practice with sensible values.
> 

Good point ! I think falling back to defaults in case the values from the
DT exceed a limit is a good approach. I will do that in the next
version.

> Will

Thanks,
Seb