linux-kernel - Re: [PATCH v4 20/22] rv: Add rtapp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250425063456.NBE35YHR@linutronix.de>
Date: Fri, 25 Apr 2025 08:34:56 +0200
From: Nam Cao <namcao@...utronix.de>
To: Gabriele Monaco <gmonaco@...hat.com>
Cc: Steven Rostedt <rostedt@...dmis.org>,
	linux-trace-kernel@...r.kernel.org, linux-kernel@...r.kernel.org,
	john.ogness@...utronix.de
Subject: Re: [PATCH v4 20/22] rv: Add rtapp_sleep monitor

On Thu, Apr 24, 2025 at 03:55:34PM +0200, Gabriele Monaco wrote:
> I've been playing with these monitors, code-wise they look good.
> I tested a bit and they seem to work without many surprises by doing
> something as simple as:
> 
> perf stat -e rv:error_sleep stress-ng --cpu-sched 1 -t 10s
>   -- shows several errors --

This one is a monitor's bug.

The monitor mistakenly sees the task getting woken up, *then* sees it going
to sleep.

This is due to trace_sched_switch() being called with a stale 'prev_state'.
'prev_state' is read at the beginning of __schedule(), but
trace_sched_switch() is invoked a bit later. Therefore if task->__state is
changed inbetween, 'prev_state' is not the value of task->__state.

The monitor checks (prev_state & TASK_INTERRUPTIBLE) to determine if the
task is going to sleep. This can be incorrect due to the race above. The
monitor sees the task going to sleep, but actually it is just preempted.

I think this also answers the race you observed with the srs monitor?

> perf stat -e rv:error_sleep stress-ng --prio-inv 1 --prio-inv-policy rr
>   -- shows only 1 error (normal while starting the program?) --
> 
> Not quite sound, but does it look a reasonable test to you?

The above command use mutexes with priority inheritance. That is good for
real-time. The errors are due to real-time tasks being delayed by
waitpid().

Priority inheritance can be disabled with "--prio-inv-type none". Then you
will see lots of errors with mutexes.

> I quickly tried the same with the other monitor comparing the number of
> errors with the page_faults generated by perf, but that didn't make too
> much sense. Perhaps I'm doing something wrong here though (the number
> reported by perf for page faults feels a bit too high).
> 
> perf stat -e page-faults -e rv:error_pagefault stress-ng --cyclic 1

This command run a non-real-time thread to do setup, and a cyclic real-time
thread. The number of pagefaults of each thread would be roughly
proportional to the code size executed by each thread. As the non-real-time
thread's code size is bigger, it sounds reasonable that the number of
pagefaults is greater than the number of monitor's warnings.
> 
> Anyway, the monitor looks good to me
> 
>   Reviewed-by: Gabriele Monaco <gmonaco@...hat.com>
> 
> but it'd be nice if you have tips to share how to quickly test it (e.g.
> without writing a custom workload).

I tested the monitor on a real system. My system has some real-time audio
processing processes (pipewire, firefox running youtube), yours also
should.

But thanks so much for testing with stress-ng. My testing didn't stress the
system enough for the above race to happen. I will give stress-ng a few
runs before the next version.

Best regards,
Nam