[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b1c8713cd10358355d4086e6d80ce6f10d295fe8.camel@svanheule.net>
Date: Thu, 14 Oct 2021 12:26:59 +0200
From: Sander Vanheule <sander@...nheule.net>
To: Guenter Roeck <linux@...ck-us.net>
Cc: linux-watchdog@...r.kernel.org, devicetree@...r.kernel.org,
Wim Van Sebroeck <wim@...ux-watchdog.org>,
Rob Herring <robh+dt@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] watchdog: Add Realtek Otto watchdog timer
On Wed, 2021-10-13 at 14:03 -0700, Guenter Roeck wrote:
> On 10/13/21 12:46 PM, Sander Vanheule wrote:
> > On Wed, 2021-10-13 at 11:48 -0700, Guenter Roeck wrote:
> > > On Wed, Oct 13, 2021 at 03:29:00PM +0200, Sander Vanheule wrote:
> > [...]
> >
> > > >
> > > > diff --git a/drivers/watchdog/realtek_otto_wdt.c
> > > > b/drivers/watchdog/realtek_otto_wdt.c
> > > > new file mode 100644
> > > > index 000000000000..64c9cba6b0b1
> > > > --- /dev/null
> > > > +++ b/drivers/watchdog/realtek_otto_wdt.c
> > > > @@ -0,0 +1,411 @@
> > > > +// SPDX-License-Identifier: GPL-2.0-only
> > > > +
> > > > +/*
> > > > + * Realtek Otto MIPS platform watchdog
> > > > + *
> > > > + * Watchdog timer that will reset the system after timeout, using the
> > > > selected
> > > > + * reset mode.
> > > > + *
> > > > + * Counter scaling and timeouts:
> > > > + * - Base prescale of (2 << 25), providing tick duration T_0: 168ms @
> > > > 200MHz
> > > > + * - PRESCALE: logarithmic prescaler adding a factor of {1, 2, 4, 8}
> > > > + * - Phase 1: Times out after (PHASE1 + 1) × PRESCALE × T_0
> > > > + * Generates an interrupt, WDT cannot be stopped after phase 1
> > > > + * - Phase 2: starts after phase 1, times out after (PHASE2 + 1) ×
> > > > PRESCALE × T_0
> > > > + * Resets the system according to RST_MODE
> > >
> > > Why is there a phase2 interrupt if phase2 resets the chip ?
> > >
> >
> > The SoC's reset controller has an interrupt line for phase2, even though
> > then it then the
> > WDT also resets the system. I don't have any documentation about this
> > peripheral; just
> > some vendor code and there the phase2 interrupt isn't enabled. I mainly
> > added it here for
> > completeness.
> >
>
> It seems pointless to mandate an interrupt just for completeness.
Okay, then I will just drop it here. As I understand, the bindings should be as
complete as possible, so I think the phase2 interrupt definition should remain
there?
>
> > One thing to note is that after CPU or software reset (not SoC reset) the
> > phase2 flag in
> > OTTO_WDT_REG_INTR will be set. That's why I always clear it in
> > otto_wdt_probe(), because
> > otherwise enabling the interrupt line would trigger otto_wdt_phase2_isr().
> > On warm
> > restarts this bit could be used to determine if there was a WDT timeout, but
> > not if the
> > WDT is configured for cold restarts (i.e. full SoC reset).
> >
> > >
> > [...]
> > > > +
> > > > + raw_spin_lock_irqsave(&ctrl->lock, flags);
> > > > + v = ioread32(ctrl->base + OTTO_WDT_REG_CTRL);
> > > > + v |= OTTO_WDT_CTRL_ENABLE;
> > > > + iowrite32(v, ctrl->base + OTTO_WDT_REG_CTRL);
> > > > + raw_spin_unlock_irqrestore(&ctrl->lock, flags);
> > >
> > > Is it really necessary to disable interrupts for those operations ?
> >
> > The ISR routines only use REG_INTR, which isn't modified anywhere else
> > (outside of probing
> > the device). I will replace these with raw_spin_{lock,unlock} throughout.
> >
>
> In that case you should not need any locks at all since the watchdog core
> ensures
> that the device is opened only once (and thus only one entity can enable or
> disable
> the watchdog).
If there is an external guarantee that at most one of {set_timeout,
set_pretimeout, enable, disable} will be called at a time, I can indeed drop the
lock. I had added the lock initially because of the read-modify-write operations
on the control register these ops perform.
>
> > [...]
> > > > +/*
> > > > + * The timer asserts the PHASE1/PHASE2 IRQs when the number of ticks
> > > > exceeds
> > > > + * the value stored in those fields. This means the timer will run for
> > > > at least
> > > > + * one tick, so small values need to be clamped to correctly reflect
> > > > the timeout.
> > > > + */
> > > > +static inline unsigned int div_round_ticks(unsigned int val, unsigned
> > > > int
> > > > tick_duration,
> > > > + unsigned int min_ticks)
> > > > +{
> > > > + return max(min_ticks, DIV_ROUND_CLOSEST(val, tick_duration));
> > >
> > > Are you sure that DIV_ROUND_CLOSEST is appropriate in those calculations
> > > (instead of DIV_ROUND_UP or DIV_ROUND_DOWN) ?
> > >
> > [...]
> >
> > > > +
> > > > + timeout_ms = total_ticks * tick_ms;
> > > > + ctrl->wdev.timeout = DIV_ROUND_CLOSEST(timeout_ms, 1000);
> > > > +
> > >
> > > That means the actual timeout (and pretimeout) can be slightly larger
> > > than the real timeout. Is this really what you want ?
> >
> > Is it a problem if the WDT times out later than specified by
> > watchdog_device.(pre)timeout?
> > I can see that premature timeouts would be an issue, but I don't suppose
> > it's problematic
> > if the WDT is pinged (slightly) sooner than actually required?
> >
>
> I am not concerned with early pings. However, if the timeout limit is set to a
> value
> lardger than the real timeout (eg the real timeout is 25.6 seconds and the
> timeout
> value is set to 26 seconds), the reset may occur a bit early. Granted, it
> doesn't
> matter much, but most driver authors would ensure that the timeout is set to
> 25 seconds
> (ie rounded down) in that situation.
I'll replace tick rounding with DIV_ROUND_UP, and timeout rounding with regular
flooring division. This results in a few timeout values being rounded up for the
coarsest tick duration, but those are then stable values.
Best,
Sander
>
> > The coarsest ticks are 1342 ms, so it is not always possible to provide the
> > requested
> > (pre)timeout value, independent of the rounding scheme. Although I think it
> > should be
> > possible to replace timeout rounding by DIV_ROUND_UP (of total_ticks_ms),
> > and pretimeout
> > rounding by DIV_ROUND_DOWN (of phase2_ticks_ms), and keep stable timeouts
> > when alternating
> > between set_timeout/set_pretimeout.
> >
> > >
> > > > + pretimeout_ms = phase2_ticks * tick_ms;
> > > > + ctrl->wdev.pretimeout = DIV_ROUND_CLOSEST(pretimeout_ms, 1000);
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static int otto_wdt_set_timeout(struct watchdog_device *wdev, unsigned
> > > > int val)
> > > > +{
> > > > + struct otto_wdt_ctrl *ctrl = watchdog_get_drvdata(wdev);
> > > > + unsigned long flags;
> > > > + unsigned int ret;
> > > > +
> > > > + if (watchdog_timeout_invalid(wdev, val))
> > > > + return -EINVAL;
> > >
> > > This is not supposed to happen because the calling code already performs
> > > range checks.
> >
> > Right, I will drop the redundant check here and in set_pretimeout.
> >
> > >
> > [...]
> > > > +static int otto_wdt_restart(struct watchdog_device *wdev, unsigned long
> > > > reboot_mode,
> > > > + void *data)
> > > > +{
> > > > + struct otto_wdt_ctrl *ctrl = watchdog_get_drvdata(wdev);
> > > > + u32 reset_mode;
> > > > + u32 v;
> > > > +
> > > > + devm_free_irq(ctrl->dev, ctrl->irq_phase1, ctrl);
> > > > +
> > >
> > > Why is this needed (instead of, say, disabling the interrupt) ?
> >
> > Disabling the interrupt should actually be enough. I'll replace the
> > devm_free_irq() with
> > disable_irq(). Somehow I didn't find disable_irq(), even though that was
> > what I was
> > looking for...
> >
> > [...]
> > > > +
> > > > + /*
> > > > + * Since pretimeout cannot be disabled, min_timeout is twice the
> > > > + * subsystem resolution. max_timeout is 44s at a bus clock of
> > > > 200MHz.
> > > > + */
> > > > + ctrl->wdev.min_timeout = 2;
> > > > + max_tick_ms = otto_wdt_tick_ms(ctrl, OTTO_WDT_PRESCALE_MAX);
> > > > + ctrl->wdev.max_timeout =
> > > > + DIV_ROUND_CLOSEST(max_tick_ms *
> > > > OTTO_WDT_TIMEOUT_TICKS_MAX, 1000);
> > >
> > > Any reason for using max_timeout instead of max_hw_heartbeat_ms ?
> >
> > I must have missed this when I was looking at watchdog_device. Makes sense
> > to use
> > max_hw_heartbeat_ms since that reflects the actual value more accurately.
> >
> > >
> >
> > Thanks for the feedback!
> >
> > Best,
> > Sander
> >
>
Powered by blists - more mailing lists