[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d907bcaa640a44ecb739b3253df49e16bcd4e38d.camel@redhat.com>
Date: Tue, 19 Aug 2025 12:53:29 +0200
From: Gabriele Monaco <gmonaco@...hat.com>
To: Juri Lelli <juri.lelli@...hat.com>
Cc: linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>, linux-trace-kernel@...r.kernel.org,
Nam Cao <namcao@...utronix.de>, Tomas Glozar <tglozar@...hat.com>, Juri
Lelli <jlelli@...hat.com>, Clark Williams <williams@...hat.com>, John
Kacur <jkacur@...hat.com>
Subject: Re: [RFC PATCH 08/17] rv: Add Hybrid Automata monitor type
On Tue, 2025-08-19 at 12:08 +0200, Juri Lelli wrote:
> On 19/08/25 11:48, Gabriele Monaco wrote:
> > That's a good point, I need to check the actual overhead..
> >
> > One thing to note is that this timer is used only on state
> > constraints,
> > one could write roughly the same monitor like this:
> >
> > +------------------------------------------+
> > | enqueued |
> > +------------------------------------------+
> > |
> > | sched_switch_in;clk < threshold_jiffies
> > v
> >
> > or like this:
> >
> > +------------------------------------------+
> > | enqueued |
> > | clk < threshold_jiffies |
> > +------------------------------------------+
> > |
> > | sched_switch_in
> > v
> >
> > the first won't fail as soon as the threshold passes, but will
> > eventually fail when the sched_switch_in event occurs. This won't
> > use a timer at all (well, mostly, some calls are still made to keep
> > the code general, I could improve that if it matters).
> >
> > Depending on the monitor, the first option could be a lower
> > overhead yet valid alternative to the second, if it's guaranteed
> > sched_switch_in will eventually come and reaction latency isn't an
> > issue.
>
> Right, as in the first example you have in the docs. I was thinking
> it would be cool to possibly replace the hung task monitor with this
> one, but again we would need to check for overhead, as the definition
> that does expect a switch_in to eventually happen wouldn't work in
> this case.
Yeah if the overhead is really high that might be an option. Although
the monitor might become a bit pointless then: if a task starves
forever, no error will be reported.
If that's a real issue, I might look at other options where to check
for constraints (the tick perhaps).
> > > Does this also need to be _HARD on RT for the monitor to work?
> >
> > That might be something we want configurable actually.. I assume
> > the more aggressive the timer is, the more overhead it will have on
> > the system.
> > Some monitors might be fine with a bit of latency.
>
> It might not only be about latency, as if the callback timer is not
> serviced in case of starvation (if it's not hard) then the monitor
> won't probably react and we won't be able to rely on it.
I think hit that in some conditions and changed the ha_cancel_timer()
to handle this case.
After leaving the state arming a timer, we always cancel it (to avoid
it expiring outside) at that time if it was expiring but the callback
didn't run, the monitor fails.
Again, if the monitor never leaves the state, we'd never report a
failure, but I'm not sure how common that is.
Thanks,
Gabriele
Powered by blists - more mailing lists