linux-kernel - Re: [PATCH v3 13/22] rv: Add support for LTL monitors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4edad1940b2d05f1997895d4bbc11f02a921e8e5.camel@redhat.com>
Date: Wed, 16 Apr 2025 11:34:53 +0200
From: Gabriele Monaco <gmonaco@...hat.com>
To: Nam Cao <namcao@...utronix.de>, Steven Rostedt <rostedt@...dmis.org>, 
	linux-trace-kernel@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: john.ogness@...utronix.de
Subject: Re: [PATCH v3 13/22] rv: Add support for LTL monitors


On Wed, 2025-04-16 at 08:51 +0200, Nam Cao wrote:
> While attempting to implement DA monitors for some complex
> specifications,
> deterministic automaton is found to be inappropriate as the
> specification
> language. The automaton is complicated, hard to understand, and
> error-prone.
> 
> For these cases, linear temporal logic is more suitable as the
> specification language.
> 
> Add support for linear temporal logic runtime verification monitor.
> 
> For all the details, see the documentations added by this commit.
> 
> Signed-off-by: Nam Cao <namcao@...utronix.de>
> ---
>  Documentation/trace/rv/index.rst              |   1 +
>  .../trace/rv/linear_temporal_logic.rst        | 119 ++++
>  Documentation/trace/rv/monitor_synthesis.rst  | 141 ++++-
>  include/linux/rv.h                            |  62 +-
>  include/rv/ltl_monitor.h                      | 184 ++++++
>  kernel/fork.c                                 |   5 +-
>  kernel/trace/rv/Kconfig                       |   7 +
>  kernel/trace/rv/rv_trace.h                    |  47 ++
>  tools/verification/rvgen/.gitignore           |   3 +
>  tools/verification/rvgen/Makefile             |   2 +
>  tools/verification/rvgen/__main__.py          |   3 +-
>  tools/verification/rvgen/rvgen/ltl2ba.py      | 558
> ++++++++++++++++++
>  tools/verification/rvgen/rvgen/ltl2k.py       | 242 ++++++++
>  .../rvgen/rvgen/templates/ltl2k/main.c        | 102 ++++
>  .../rvgen/rvgen/templates/ltl2k/trace.h       |  14 +
>  15 files changed, 1465 insertions(+), 25 deletions(-)
>  create mode 100644 Documentation/trace/rv/linear_temporal_logic.rst
>  create mode 100644 include/rv/ltl_monitor.h
>  create mode 100644 tools/verification/rvgen/.gitignore
>  create mode 100644 tools/verification/rvgen/rvgen/ltl2ba.py
>  create mode 100644 tools/verification/rvgen/rvgen/ltl2k.py
>  create mode 100644
> tools/verification/rvgen/rvgen/templates/ltl2k/main.c
>  create mode 100644
> tools/verification/rvgen/rvgen/templates/ltl2k/trace.h
> 
> [...]
> diff --git a/include/rv/ltl_monitor.h b/include/rv/ltl_monitor.h
> new file mode 100644
> index 000000000000..78f5a1197665
> --- /dev/null
> +++ b/include/rv/ltl_monitor.h
> @@ -0,0 +1,184 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/**
> + * This file must be combined with the $(MODEL_NAME).h file
> generated by
> + * tools/verification/rvgen.
> + */
> +
> +#include <linux/args.h>
> +#include <linux/rv.h>
> +#include <linux/stringify.h>
> +#include <linux/seq_buf.h>
> +#include <rv/instrumentation.h>
> +#include <trace/events/task.h>
> +#include <trace/events/sched.h>
> +
> +#ifndef MONITOR_NAME
> +#error "MONITOR_NAME macro is not defined. Did you include
> $(MODEL_NAME).h generated by rvgen?"
> +#endif
> +
> +#ifdef CONFIG_RV_REACTORS
> +#define RV_MONITOR_NAME CONCATENATE(rv_, MONITOR_NAME)
> +static struct rv_monitor RV_MONITOR_NAME;
> +
> +static void rv_cond_react(struct task_struct *task)
> +{
> +	if (!rv_reacting_on() || !RV_MONITOR_NAME.react)
> +		return;
> +	RV_MONITOR_NAME.react("rv: "__stringify(MONITOR_NAME)":
> %s[%d]: violation detected\n",
> +			      task->comm, task->pid);
> +}

What about adding more context here (see below).

> +#else
> +static void rv_cond_react(struct task_struct *task)
> +{
> +}
> +#endif
> +
> [...]
> diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
> index 99c3801616d4..f9fb848bae91 100644
> --- a/kernel/trace/rv/rv_trace.h
> +++ b/kernel/trace/rv/rv_trace.h
> @@ -127,6 +127,53 @@ DECLARE_EVENT_CLASS(error_da_monitor_id,
>  // Add new monitors based on CONFIG_DA_MON_EVENTS_ID here
>  
>  #endif /* CONFIG_DA_MON_EVENTS_ID */
> +#if CONFIG_LTL_MON_EVENTS_ID
> +TRACE_EVENT(event_ltl_monitor_id,
> +
> +	TP_PROTO(struct task_struct *task, char *states, char
> *atoms, char *next),
> +
> +	TP_ARGS(task, states, atoms, next),
> +
> +	TP_STRUCT__entry(
> +		__string(comm, task->comm)
> +		__field(pid_t, pid)
> +		__string(states, states)
> +		__string(atoms, atoms)
> +		__string(next, next)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(comm);
> +		__entry->pid = task->pid;
> +		__assign_str(states);
> +		__assign_str(atoms);
> +		__assign_str(next);
> +	),
> +
> +	TP_printk("%s[%d]: (%s) x (%s) -> (%s)", __get_str(comm),
> __entry->pid, __get_str(states),
> +		  __get_str(atoms), __get_str(next))
> +);
> +
> +TRACE_EVENT(error_ltl_monitor_id,
> +
> +	TP_PROTO(struct task_struct *task),
> +
> +	TP_ARGS(task),
> +
> +	TP_STRUCT__entry(
> +		__string(comm, task->comm)
> +		__field(pid_t, pid)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(comm);
> +		__entry->pid = task->pid;
> +	),
> +
> +	TP_printk("%s[%d]: violation detected", __get_str(comm),

In your workflow you're probably using events and errors together, but
wouldn't it help printing the atoms together with the violation
detected?
At least to give a clue on the error in case the user doesn't want to
see the entire trace (which might be needed for a full debug though).

The same could be said from reactors, the user doesn't have much
information to infer what went wrong.

> __entry->pid)
> +);
> +// Add new monitors based on CONFIG_LTL_MON_EVENTS_ID here
> +#endif /* CONFIG_LTL_MON_EVENTS_ID */
>  #endif /* _TRACE_RV_H */
>  
>  /* This part must be outside protection */
> [...]
> diff --git a/tools/verification/rvgen/rvgen/ltl2k.py
> b/tools/verification/rvgen/rvgen/ltl2k.py
> new file mode 100644
> index 000000000000..e2a7c4bcccc9
> --- /dev/null
> +++ b/tools/verification/rvgen/rvgen/ltl2k.py
> @@ -0,0 +1,242 @@
> +#!/usr/bin/env python3
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +from pathlib import Path
> +from . import generator
> +from . import ltl2ba
> +
> +COLUMN_LIMIT = 100
> +
> +def line_len(line: str) -> int:
> +    tabs = line.count('\t')
> +    return tabs * 7 + len(line)
> +
> +def break_long_line(line: str, indent='') -> list[str]:
> +    result = []
> +    while line_len(line) > COLUMN_LIMIT:
> +        i = line[:COLUMN_LIMIT - line_len(line)].rfind(' ')
> +        result.append(line[:i])
> +        line = indent + line[i + 1:]
> +    if line:
> +        result.append(line)
> +    return result
> +
> +def build_condition_string(node: ltl2ba.GraphNode):
> +    if not node.labels:
> +        return "(true)"
> +
> +    result = "("
> +
> +    first = True
> +    for label in sorted(node.labels):
> +        if not first:
> +            result += " && "
> +        result += label
> +        first = False
> +
> +    result += ")"
> +
> +    return result
> +
> +def abbreviate_atoms(atoms: list[str]) -> list[str]:
> +    abbrs = list()
> +    for atom in atoms:
> +        size = 1
> +        while True:
> +            abbr = atom[:size]
> +            if sum(a.startswith(abbr) for a in atoms) == 1:
> +                break
> +            size += 1
> +        abbrs.append(abbr.lower())
> +    return abbrs

I get this is just a matter of preference, so feel free to ignore my
suggestion.
This abbreviation algorithm doesn't work too well with atoms starting
with the same substring and can produce some unexpected result:

LTL_BLOCK_ON_RT_MUTEX:                b,
LTL_KERNEL_THREAD:                    ke,
LTL_KTHREAD_SHOULD_STOP:              kt,
LTL_NANOSLEEP:                        n,
LTL_PI_FUTEX:                         p,
LTL_RT:                               r,
LTL_SLEEP:                            s,
LTL_TASK_IS_MIGRATION:                task_is_m,
LTL_TASK_IS_RCU:                      task_is_r,
LTL_WAKE:                             wa,
LTL_WOKEN_BY_EQUAL_OR_HIGHER_PRIO:    woken_by_e,
LTL_WOKEN_BY_HARDIRQ:                 woken_by_h,
LTL_WOKEN_BY_NMI:                     woken_by_n,

"woken_by_*" and "task_is_*" atom can get unnecessarily long and
while reading "kt" I might think about kernel_thread.

I was thinking about something like:

LTL_BLOCK_ON_RT_MUTEX:                b_o_r_m
LTL_KERNEL_THREAD:                    k_t
LTL_KTHREAD_SHOULD_STOP:              k_s_s
LTL_NANOSLEEP:                        n
LTL_PI_FUTEX:                         p_f
LTL_RT:                               r
LTL_SLEEP:                            s
LTL_TASK_IS_MIGRATION:                t_i_m
LTL_TASK_IS_RCU:                      t_i_r
LTL_WAKE:                             w
LTL_WOKEN_BY_EQUAL_OR_HIGHER_PRIO:    w_b_e_o_h_p
LTL_WOKEN_BY_HARDIRQ:                 w_b_h
LTL_WOKEN_BY_NMI:                     w_b_n

or even

LTL_BLOCK_ON_RT_MUTEX:                b_m
LTL_KERNEL_THREAD:                    k_t
LTL_KTHREAD_SHOULD_STOP:              k_s_s
LTL_NANOSLEEP:                        n
LTL_PI_FUTEX:                         p_f
LTL_RT:                               r
LTL_SLEEP:                            s
LTL_TASK_IS_MIGRATION:                t_m
LTL_TASK_IS_RCU:                      t_r
LTL_WAKE:                             w
LTL_WOKEN_BY_EQUAL_OR_HIGHER_PRIO:    w_e_h_p
LTL_WOKEN_BY_HARDIRQ:                 w_h
LTL_WOKEN_BY_NMI:                     w_n

I used the following code to come up with this:

def abbreviate_atoms(atoms: list[str]) -> list[str]:
    # completely arbitrary..
    skip = [ "is", "by", "or", "and" ]
    def abbr (n, s):
        return '_'.join(word[:n] for word in s.lower().split('_') if word not in skip)
    for n in range(1, 32):
        abbrs = [abbr(n, a) for a in atoms]
        if len(abbrs) == len(set(abbrs)):
            return abbrs

Which could even be tuned to use 2 letters per block instead of 1
(improving readability by a lot actually)..
'bl_on_rt_mu', 'ke_th', 'kt_sh_st', 'na', 'pi_fu', 'rt', 'sl', 'ta_mi',
'ta_rc', 'wa', 'wo_eq_hi_pr', 'wo_ha', 'wo_nm'

What do you think?

Thanks,
Gabriele