linux-kernel - Re: [PATCH v4 20/22] rv: Add rtapp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <169b8478-1eff-46b1-a782-f0cb529330bb@redhat.com>
Date: Fri, 25 Apr 2025 07:35:36 +0000 (UTC)
From: Gabriele Monaco <gmonaco@...hat.com>
To: Nam Cao <namcao@...utronix.de>
Cc: Steven Rostedt <rostedt@...dmis.org>, linux-trace-kernel@...r.kernel.org,
	linux-kernel@...r.kernel.org, john.ogness@...utronix.de
Subject: Re: [PATCH v4 20/22] rv: Add rtapp_sleep monitor

2025-04-25T06:35:09Z Nam Cao <namcao@...utronix.de>:

> On Thu, Apr 24, 2025 at 03:55:34PM +0200, Gabriele Monaco wrote:
>> I've been playing with these monitors, code-wise they look good.
>> I tested a bit and they seem to work without many surprises by doing
>> something as simple as:
>>
>> perf stat -e rv:error_sleep stress-ng --cpu-sched 1 -t 10s
>>   -- shows several errors --
>
> This one is a monitor's bug.
>
> The monitor mistakenly sees the task getting woken up, *then* sees it going
> to sleep.
>
> This is due to trace_sched_switch() being called with a stale 'prev_state'.
> 'prev_state' is read at the beginning of __schedule(), but
> trace_sched_switch() is invoked a bit later. Therefore if task->__state is
> changed inbetween, 'prev_state' is not the value of task->__state.
>
> The monitor checks (prev_state & TASK_INTERRUPTIBLE) to determine if the
> task is going to sleep. This can be incorrect due to the race above. The
> monitor sees the task going to sleep, but actually it is just preempted.
>
> I think this also answers the race you observed with the srs monitor?
>

Yeah that could be the culprit.
Peter's fix [1] landed on next recently, I guess in a couple of days you'll get it on the upstream tree and you may not see the problem.

Nevertheless, I didn't check exactly what --cpu-sched does, but I'm expecting it to do any wild thing to stress the scheduler, so it may be normal to have more errors than, say, --cyclic, which only runs nanosleeps.

>> perf stat -e rv:error_sleep stress-ng --prio-inv 1 --prio-inv-policy rr
>>   -- shows only 1 error (normal while starting the program?) --
>>
>> Not quite sound, but does it look a reasonable test to you?
>
> The above command use mutexes with priority inheritance. That is good for
> real-time. The errors are due to real-time tasks being delayed by
> waitpid().
>
> Priority inheritance can be disabled with "--prio-inv-type none". Then you
> will see lots of errors with mutexes.
>

Great, that's exactly what I wanted to know, thanks.

>> I quickly tried the same with the other monitor comparing the number of
>> errors with the page_faults generated by perf, but that didn't make too
>> much sense. Perhaps I'm doing something wrong here though (the number
>> reported by perf for page faults feels a bit too high).
>>
>> perf stat -e page-faults -e rv:error_pagefault stress-ng --cyclic 1
>
> This command run a non-real-time thread to do setup, and a cyclic real-time
> thread. The number of pagefaults of each thread would be roughly
> proportional to the code size executed by each thread. As the non-real-time
> thread's code size is bigger, it sounds reasonable that the number of
> pagefaults is greater than the number of monitor's warnings.

Mmh I guessed something like that, although numbers were a bit out of proportion (e.g. 500 page-faults and 8 errors), but again, I didn't check too carefully what happens under the hood.

>>
>> Anyway, the monitor looks good to me
>>
>>   Reviewed-by: Gabriele Monaco <gmonaco@...hat.com>
>>
>> but it'd be nice if you have tips to share how to quickly test it (e.g.
>> without writing a custom workload).
>
> I tested the monitor on a real system. My system has some real-time audio
> processing processes (pipewire, firefox running youtube), yours also
> should.

That's a good point, also I didn't mention I was running these tests in a VM (virtme-ng), so the system stress is minimal and perhaps the setup triggers some different oddities (filesystems are overlays and some other things are set up differently from a real system).

>
> But thanks so much for testing with stress-ng. My testing didn't stress the
> system enough for the above race to happen. I will give stress-ng a few
> runs before the next version.
>
> Best regards,
> Nam

Thank you for the tips!
Cheers,
Gabriele

[1] - https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=8feb053d53194382fcfb68231296fdc220497ea6