[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Ysfoc2YhQCLge1iY@google.com>
Date: Fri, 8 Jul 2022 08:18:59 +0000
From: Sebastian Ene <sebastianene@...gle.com>
To: Will Deacon <will@...nel.org>
Cc: Rob Herring <robh+dt@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Arnd Bergmann <arnd@...db.de>,
Dragan Cvetic <dragan.cvetic@...inx.com>,
linux-kernel@...r.kernel.org, devicetree@...r.kernel.org,
maz@...nel.org, vdonnefort@...gle.com,
Guenter Roeck <linux@...ck-us.net>
Subject: Re: [PATCH v10 2/2] misc: Add a mechanism to detect stalls on guest
vCPUs
On Thu, Jul 07, 2022 at 07:27:38PM +0100, Will Deacon wrote:
> Hi Sebastian,
>
> On Thu, Jul 07, 2022 at 03:42:27PM +0000, Sebastian Ene wrote:
Hi Will,
> > This driver creates per-cpu hrtimers which are required to do the
> > periodic 'pet' operation. On a conventional watchdog-core driver, the
> > userspace is responsible for delivering the 'pet' events by writing to
> > the particular /dev/watchdogN node. In this case we require a strong
> > thread affinity to be able to account for lost time on a per vCPU.
> >
> > This part of the driver is the 'frontend' which is reponsible for
> > delivering the periodic 'pet' events, configuring the virtual peripheral
> > and listening for cpu hotplug events. The other part of the driver is
> > an emulated MMIO device which is part of the KVM virtual machine
> > monitor and this part accounts for lost time by looking at the
> > /proc/{}/task/{}/stat entries.
> >
> > Signed-off-by: Sebastian Ene <sebastianene@...gle.com>
> > ---
> > drivers/misc/Kconfig | 14 ++
> > drivers/misc/Makefile | 1 +
> > drivers/misc/vcpu_stall_detector.c | 209 +++++++++++++++++++++++++++++
> > 3 files changed, 224 insertions(+)
> > create mode 100644 drivers/misc/vcpu_stall_detector.c
>
> Thanks for addressing all of my feedback on v9 so promptly:
>
> Reviewed-by: Will Deacon <will@...nel.org>
>
> Just one question on this part:
>
> > +static enum hrtimer_restart
> > +vcpu_stall_detect_timer_fn(struct hrtimer *hrtimer)
> > +{
> > + u32 ticks, ping_timeout_ms;
> > +
> > + /* Reload the stall detector counter register every
> > + * `ping_timeout_ms` to prevent the virtual device
> > + * from decrementing it to 0. The virtual device decrements this
> > + * register at 'clock_freq_hz' frequency.
> > + */
> > + ticks = vcpu_stall_config.clock_freq_hz *
> > + vcpu_stall_config.stall_timeout_sec;
>
> It would be quite easy for this to overflow 32 bits, so perhaps it would
> be best to check the values from the DT during probe and fallback to the
> defaults (with a warning) if the result of the multiplication is out
> of range for the 32-bit register.
>
> What do you think? My review stands in any case, as this shouldn't happen
> in practice with sensible values.
>
Good point ! I think falling back to defaults in case the values from the
DT exceed a limit is a good approach. I will do that in the next
version.
> Will
Thanks,
Seb
Powered by blists - more mailing lists