[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250416115658.AkFAts-B@linutronix.de>
Date: Wed, 16 Apr 2025 13:56:58 +0200
From: Nam Cao <namcao@...utronix.de>
To: Gabriele Monaco <gmonaco@...hat.com>
Cc: Steven Rostedt <rostedt@...dmis.org>,
linux-trace-kernel@...r.kernel.org, linux-kernel@...r.kernel.org,
john.ogness@...utronix.de
Subject: Re: [PATCH v3 13/22] rv: Add support for LTL monitors
On Wed, Apr 16, 2025 at 11:34:53AM +0200, Gabriele Monaco wrote:
> On Wed, 2025-04-16 at 08:51 +0200, Nam Cao wrote:
> > #endif /* CONFIG_DA_MON_EVENTS_ID */
> > +#if CONFIG_LTL_MON_EVENTS_ID
> > +TRACE_EVENT(event_ltl_monitor_id,
> > +
> > + TP_PROTO(struct task_struct *task, char *states, char
> > *atoms, char *next),
> > +
> > + TP_ARGS(task, states, atoms, next),
> > +
> > + TP_STRUCT__entry(
> > + __string(comm, task->comm)
> > + __field(pid_t, pid)
> > + __string(states, states)
> > + __string(atoms, atoms)
> > + __string(next, next)
> > + ),
> > +
> > + TP_fast_assign(
> > + __assign_str(comm);
> > + __entry->pid = task->pid;
> > + __assign_str(states);
> > + __assign_str(atoms);
> > + __assign_str(next);
> > + ),
> > +
> > + TP_printk("%s[%d]: (%s) x (%s) -> (%s)", __get_str(comm),
> > __entry->pid, __get_str(states),
> > + __get_str(atoms), __get_str(next))
> > +);
> > +
> > +TRACE_EVENT(error_ltl_monitor_id,
> > +
> > + TP_PROTO(struct task_struct *task),
> > +
> > + TP_ARGS(task),
> > +
> > + TP_STRUCT__entry(
> > + __string(comm, task->comm)
> > + __field(pid_t, pid)
> > + ),
> > +
> > + TP_fast_assign(
> > + __assign_str(comm);
> > + __entry->pid = task->pid;
> > + ),
> > +
> > + TP_printk("%s[%d]: violation detected", __get_str(comm),
>
> In your workflow you're probably using events and errors together, but
> wouldn't it help printing the atoms together with the violation
> detected?
> At least to give a clue on the error in case the user doesn't want to
> see the entire trace (which might be needed for a full debug though).
>
> The same could be said from reactors, the user doesn't have much
> information to infer what went wrong.
Actually my intention for the "event" tracepoints are only for debugging
the monitor itself. I don't want to bother users with the Büchi automaton,
because that's implementation details.
The "error" tracepoints should be enough for identifying problems with
realtime applications. Because errors from the monitors are unambiguous:
- pagefault monitor: error means an RT task is raising pagefault
- sleep monitor: error means an RT task is delayed unboundedly
That and a stacktrace (e.g. from perf) is enough to understand the problem.
That was all I need to identifying problems with pipewire using the
monitors.
In the future, we can have other monitors whose warnings are ambiguous, and
a more detailed error message will be necessary. But for now, I think we
can keep it simple.
> > +def abbreviate_atoms(atoms: list[str]) -> list[str]:
> > + abbrs = list()
> > + for atom in atoms:
> > + size = 1
> > + while True:
> > + abbr = atom[:size]
> > + if sum(a.startswith(abbr) for a in atoms) == 1:
> > + break
> > + size += 1
> > + abbrs.append(abbr.lower())
> > + return abbrs
>
> I get this is just a matter of preference, so feel free to ignore my
> suggestion.
> This abbreviation algorithm doesn't work too well with atoms starting
> with the same substring and can produce some unexpected result:
>
> LTL_BLOCK_ON_RT_MUTEX: b,
> LTL_KERNEL_THREAD: ke,
> LTL_KTHREAD_SHOULD_STOP: kt,
> LTL_NANOSLEEP: n,
> LTL_PI_FUTEX: p,
> LTL_RT: r,
> LTL_SLEEP: s,
> LTL_TASK_IS_MIGRATION: task_is_m,
> LTL_TASK_IS_RCU: task_is_r,
> LTL_WAKE: wa,
> LTL_WOKEN_BY_EQUAL_OR_HIGHER_PRIO: woken_by_e,
> LTL_WOKEN_BY_HARDIRQ: woken_by_h,
> LTL_WOKEN_BY_NMI: woken_by_n,
>
> "woken_by_*" and "task_is_*" atom can get unnecessarily long and
> while reading "kt" I might think about kernel_thread.
>
> I was thinking about something like:
>
> LTL_BLOCK_ON_RT_MUTEX: b_o_r_m
> LTL_KERNEL_THREAD: k_t
> LTL_KTHREAD_SHOULD_STOP: k_s_s
> LTL_NANOSLEEP: n
> LTL_PI_FUTEX: p_f
> LTL_RT: r
> LTL_SLEEP: s
> LTL_TASK_IS_MIGRATION: t_i_m
> LTL_TASK_IS_RCU: t_i_r
> LTL_WAKE: w
> LTL_WOKEN_BY_EQUAL_OR_HIGHER_PRIO: w_b_e_o_h_p
> LTL_WOKEN_BY_HARDIRQ: w_b_h
> LTL_WOKEN_BY_NMI: w_b_n
>
> or even
>
> LTL_BLOCK_ON_RT_MUTEX: b_m
> LTL_KERNEL_THREAD: k_t
> LTL_KTHREAD_SHOULD_STOP: k_s_s
> LTL_NANOSLEEP: n
> LTL_PI_FUTEX: p_f
> LTL_RT: r
> LTL_SLEEP: s
> LTL_TASK_IS_MIGRATION: t_m
> LTL_TASK_IS_RCU: t_r
> LTL_WAKE: w
> LTL_WOKEN_BY_EQUAL_OR_HIGHER_PRIO: w_e_h_p
> LTL_WOKEN_BY_HARDIRQ: w_h
> LTL_WOKEN_BY_NMI: w_n
>
> I used the following code to come up with this:
>
> def abbreviate_atoms(atoms: list[str]) -> list[str]:
> # completely arbitrary..
> skip = [ "is", "by", "or", "and" ]
> def abbr (n, s):
> return '_'.join(word[:n] for word in s.lower().split('_') if word not in skip)
> for n in range(1, 32):
> abbrs = [abbr(n, a) for a in atoms]
> if len(abbrs) == len(set(abbrs)):
> return abbrs
>
> Which could even be tuned to use 2 letters per block instead of 1
> (improving readability by a lot actually)..
> 'bl_on_rt_mu', 'ke_th', 'kt_sh_st', 'na', 'pi_fu', 'rt', 'sl', 'ta_mi',
> 'ta_rc', 'wa', 'wo_eq_hi_pr', 'wo_ha', 'wo_nm'
>
> What do you think?
Yes, this would be very nice. But as mentioned, this is mainly for
debugging the monitors, not for end users. Therefore I don't want to spend
too much energy on it.
Let me see how "nice" we can make it without spending too many hours.
Thanks for the suggestion!
Best regards,
Nam
Powered by blists - more mailing lists