[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4d27225b5a38210a40efcdb8eb778ca0ec3808f1.camel@redhat.com>
Date: Fri, 17 Oct 2025 11:48:17 +0200
From: Gabriele Monaco <gmonaco@...hat.com>
To: Nam Cao <namcao@...utronix.de>, linux-kernel@...r.kernel.org, Steven
Rostedt <rostedt@...dmis.org>, Masami Hiramatsu <mhiramat@...nel.org>,
linux-trace-kernel@...r.kernel.org
Cc: Tomas Glozar <tglozar@...hat.com>, Juri Lelli <jlelli@...hat.com>, Clark
Williams <williams@...hat.com>, John Kacur <jkacur@...hat.com>
Subject: Re: [PATCH v2 10/20] rv: Add Hybrid Automata monitor type
On Fri, 2025-10-17 at 10:44 +0200, Nam Cao wrote:
> Gabriele Monaco <gmonaco@...hat.com> writes:
> > diff --git a/include/rv/ha_monitor.h b/include/rv/ha_monitor.h
> > new file mode 100644
> > index 000000000000..fb885709b727
> > --- /dev/null
> > +++ b/include/rv/ha_monitor.h
> > @@ -0,0 +1,469 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2025-2028 Red Hat, Inc. Gabriele Monaco
> > <gmonaco@...hat.com>
> > + *
> > + * Hybrid automata (HA) monitor functions, to be used together
> > + * with automata models in C generated by the dot2k tool.
> > + *
> > + * This type of monitors extends the Deterministic automata (DA) class by
> > + * adding a set of environment variables (e.g. clocks) that can be used to
> > + * constraint the valid transitions.
> > + *
> > + * The dot2k tool is available at tools/verification/dot2k/
> ^^^^^^^^^^^^^^^^^^^^^^^^^
> non-existent
Right, need to update also in DA.
> > +/*
> > + * ktime_get_ns is expensive, since we usually don't require precise
> > accounting
> > + * of changes within the same event, cache the current time at the
> > beginning of
> > + * the constraint handler and use the cache for subsequent calls.
> > + * Monitors without ns clocks automatically skip this.
> > + */
> > +#ifdef HA_CLK_NS
> > +#define ha_update_ns_cache() ktime_get_ns()
> > +#else
> > +#define ha_update_ns_cache() 0
> > +#endif /* HA_CLK_NS */
>
> ha_update_ns_cache() itself does not cache, its caller does. So I think
> it is misleading to name this "cache".
>
> Why is "update" in the name? Isn't something like ha_get_ns() betters
> describe this macro?
>
Good point.
> > +/*
> > + * The clock variables have 2 different representations in the env_store:
> > + * - The guard representation is the timestamp of the last reset
> > + * - The invariant representation is the timestamp when the invariant
> > expires
>
> Why does guard representation store the last reset?
>
> For example, what if I specifiy the guard as "clk > 500ns". Shouldn't
> the guard representation be "500ns"?
Alright, this is the simplest way I could think to represent clocks, still it
seems confusing.
Let's start from guards (invariants are not special but I'm trying to do
something to keep precision), the value of a clock is the time that passed since
the last reset, as that's when the value is set to 0. Storing that timestamp and
just comparing the difference whenever you need to know the valuation of said
clock seemed the most straightforward thing to me. The clock representation
doesn't include the guard constraint, that is validated during the event using
the current valuation (i.e. now - reset_time).
What is important to note is that, at time of reset, you don't know what guard
is going to fire, you may as well have a state with event A asking for clk<10
and event B requiring clk<20, also the guard may be in a later state and may
depend on the path.
Invariants are bound to the form clk < N, and get "armed" when we reach the
state, from there we know exactly when the invariant is going to expire, so we
can save that (very important when using the timer wheel). Note here that the
expiration isn't exactly N from now, but it's the valuation of the clock (reset
might have occurred a few states earlier, see the nomiss case) subtracted by N,
this is what the "passed" means later.
That said, I couldn't think of a simpler implementation but any suggestion is
welcome, of course.
> > +static inline u64 ha_get_passed_ns(struct ha_monitor *ha_mon, enum envs
> > env,
> > + u64 expire, u64 time_ns)
> > +{
> > + u64 passed = 0;
> > +
> > + if (env < 0 || env >= ENV_MAX_STORED)
> > + return 0;
> > + if (ha_monitor_env_invalid(ha_mon, env))
> > + return 0;
> > + passed = ha_get_env(ha_mon, env, time_ns);
> > + ha_set_invariant_ns(ha_mon, env, expire - passed, time_ns);
>
> The function is named *get*() which indicates that it only reads. But it
> also writes.
Right.
>
> > +/*
> > + * Retrieve the last reset time (guard representation) from the invariant
> > + * representation (expiration).
> > + * It the caller's responsibility to make sure the storage was actually in
> > the
> > + * invariant representation (e.g. the current state has an invariant).
> > + * The provided value must be the same used when starting the invariant.
> > + *
> > + * This function's access to the storage is NOT atomic, due to the rarity
> > when
> > + * this is used. If a monitor allows writes concurrent to this, likely
> > + * other things are broken and need rethinking the model or additional
> > locking.
>
> Does atomic_sub() solves your "NOT atomic" problem?
Kinda, it would solve the problem for this specific subtraction, but racing
handlers could still lead to problems although the subtraction is "correct".
Since this is the only time the env storage needs to be an atomic_t and it's
fairly rare (only complicated models require calling this function at all,
others are happy with READ_ONCE/WRITE_ONCE) I didn't want to change the storage
implementation for some perceived safety.
I wrote that comment exactly to motivate why we aren't using atomic_t, but I
should probably reword that. Does this make sense to you?
>
> > +static inline void ha_start_timer_jiffy(struct ha_monitor *ha_mon, enum
> > envs env,
> > + u64 expire, u64 time_ns)
> > +{
> > + u64 passed = ha_get_passed_jiffy(ha_mon, env, expire, time_ns);
> > +
> > + mod_timer(&ha_mon->timer, get_jiffies_64() + expire - passed);
> > +}
>
> I don't understand this "passed" thing. I see this function being called
> in ha_setup_invariants() of stall monitor. Stall monitor is validating
> that the task does not stay in "enqueued" state for more than
> "threshold_jiffies". If so, what is the purpose of "passed"? From my
> understanding, it should be just
>
> static inline void ha_start_timer_jiffy(struct ha_monitor *ha_mon, enum envs
> env,
> u64 expire, u64 time_ns)
> {
> mod_timer(&ha_mon->timer, get_jiffies_64() + expire);
> }
>
> what do I miss?
As mentioned before, this is true for the stall case, where the reset occurred
when reaching the state with the invariant (passed is close to 0), if you look
at the nomiss case, reset happens before being ready (its invariant would have
passed close to 0), but the same invariant is enforced in running, here we will
see a passed far from 0 and need to take that into account when setting the
invariant.
This is by the way what made necessary caching the ns time, otherwise we'd end
up reading the ktime multiple times in the same handler for fairly no reason.
I've been stuck on this implementation for a while so I definitely need an
outsider perspective to see if things make sense!
Thanks,
Gabriele
Powered by blists - more mailing lists