[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aKRNMHCslAt3dx5t@jlelli-thinkpadt14gen4.remote.csb>
Date: Tue, 19 Aug 2025 12:08:48 +0200
From: Juri Lelli <juri.lelli@...hat.com>
To: Gabriele Monaco <gmonaco@...hat.com>
Cc: linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
linux-trace-kernel@...r.kernel.org, Nam Cao <namcao@...utronix.de>,
Tomas Glozar <tglozar@...hat.com>, Juri Lelli <jlelli@...hat.com>,
Clark Williams <williams@...hat.com>,
John Kacur <jkacur@...hat.com>
Subject: Re: [RFC PATCH 08/17] rv: Add Hybrid Automata monitor type
On 19/08/25 11:48, Gabriele Monaco wrote:
>
>
> On Tue, 2025-08-19 at 11:18 +0200, Juri Lelli wrote:
> > Hi!
> >
> > On 14/08/25 17:08, Gabriele Monaco wrote:
> >
> > ...
> >
> > > +/*
> > > + * ha_monitor_init_env - setup timer and reset all environment
> > > + *
> > > + * Called from a hook in the DA start functions, it supplies the
> > > da_mon
> > > + * corresponding to the current ha_mon.
> > > + * Not all hybrid automata require the timer, still set it for
> > > simplicity.
> > > + */
> > > +static inline void ha_monitor_init_env(struct da_monitor *da_mon)
> > > +{
> > > + struct ha_monitor *ha_mon = to_ha_monitor(da_mon);
> > > +
> > > + ha_monitor_reset_all_stored(ha_mon);
> > > + if (unlikely(!ha_mon->timer.base))
> > > + hrtimer_setup(&ha_mon->timer,
> > > ha_monitor_timer_callback,
> > > + CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> > > +}
> >
> > ...
> >
> > > +/*
> > > + * Helper functions to handle the monitor timer.
> > > + * Not all monitors require a timer, in such case the timer will
> > > be set up but
> > > + * never armed.
> > > + * Timers start since the last reset of the supplied env or from
> > > now if env is
> > > + * not an environment variable. If env was not initialised no
> > > timer starts.
> > > + * Timers can expire on any CPU unless the monitor is per-cpu,
> > > + * where we assume every event occurs on the local CPU.
> > > + */
> > > +static inline void ha_start_timer_ns(struct ha_monitor *ha_mon,
> > > enum envs env,
> > > + u64 expire)
> > > +{
> > > + int mode = HRTIMER_MODE_REL;
> > > + u64 passed = 0;
> > > +
> > > + if (env >= 0 && env < ENV_MAX_STORED) {
> > > + if (ha_monitor_env_invalid(ha_mon, env))
> > > + return;
> > > + passed = ha_get_env(ha_mon, env);
> > > + }
> > > + if (RV_MON_TYPE == RV_MON_PER_CPU)
> > > + mode |= HRTIMER_MODE_PINNED;
> > > + hrtimer_start(&ha_mon->timer, ns_to_ktime(expire -
> > > passed), mode);
> > > +}
> >
> > Also, my only concern with the usage of per-task timers is that
> > reprogramming add overhead, so I wonder if this gets noticeable when
> > running some kind of performance sensitive workload in production (as
> > it was reported for dl-server). Did you test such a case?
>
> That's a good point, I need to check the actual overhead..
>
> One thing to note is that this timer is used only on state constraints,
> one could write roughly the same monitor like this:
>
> +------------------------------------------+
> | enqueued |
> +------------------------------------------+
> |
> | sched_switch_in;clk < threshold_jiffies
> v
>
> or like this:
>
> +------------------------------------------+
> | enqueued |
> | clk < threshold_jiffies |
> +------------------------------------------+
> |
> | sched_switch_in
> v
>
> the first won't fail as soon as the threshold passes, but will
> eventually fail when the sched_switch_in event occurs. This won't use a
> timer at all (well, mostly, some calls are still made to keep the code
> general, I could improve that if it matters).
>
> Depending on the monitor, the first option could be a lower overhead
> yet valid alternative to the second, if it's guaranteed sched_switch_in
> will eventually come and reaction latency isn't an issue.
Right, as in the first example you have in the docs. I was thinking it
would be cool to possibly replace the hung task monitor with this one,
but again we would need to check for overhead, as the definition that
does expect a switch_in to eventually happen wouldn't work in this case.
> > Does this also need to be _HARD on RT for the monitor to work?
>
> That might be something we want configurable actually.. I assume the
> more aggressive the timer is, the more overhead it will have on the
> system.
> Some monitors might be fine with a bit of latency.
It might not only be about latency, as if the callback timer is not
serviced in case of starvation (if it's not hard) then the monitor won't
probably react and we won't be able to rely on it.
> For example in the deadline case, I believe, the monitor is not
> supposed to fix anything, but merely report violations. So we don't
> really care to react on time, but to react at all.
>
> I'm going to assess the overhead and see how to offer some more
> configurability.
Thanks!
Powered by blists - more mailing lists