linux-kernel - Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6fd678d8-58fc-5863-fd88-4c8335dd8b38@kernel.org>
Date:   Tue, 1 Mar 2022 20:29:23 +0100
From:   Daniel Bristot de Oliveira <bristot@...nel.org>
To:     paulmck@...nel.org
Cc:     Nicolas Saenz Julienne <nsaenzju@...hat.com>, rostedt@...dmis.org,
        mingo@...hat.com, linux-kernel@...r.kernel.org, mtosatti@...hat.com
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On 3/1/22 19:58, Paul E. McKenney wrote:
> On Tue, Mar 01, 2022 at 07:44:38PM +0100, Daniel Bristot de Oliveira wrote:
>> On 3/1/22 19:05, Paul E. McKenney wrote:
>>>> I see, as long as it costs < 1 us, I am ok. If it gets > 1us in a reasonably
>>>> fast machine, we start see HW noise where it does not exist, and that would
>>>> reduce the resolution of osnoise. AFAICS, it is not causing that problem, but we
>>>> need to make it as lightweight as possible.
>>> In the common case, it is atomically incrementing a local per-CPU counter
>>> and doing a store.  This should be quite cheap.
>>>
>>> The uncommon case is when the osnoise process was preempted or otherwise
>>> interfered with during a recent RCU read-side critical section and
>>> preemption was disabled around that critical section's outermost
>>> rcu_read_unlock().  This can be quite expensive.  But I would expect
>>> you to just not do this.  ;-)
>>
>> Getting the expensive call after a preemption is not a problem, it is a side
>> effect of the most costly preemption.
>>
>> It this case, we should "ping rcu" before reading the time to account the
>> overhead for the previous preemption which caused it.
>>
>> like (using the current code as example):
>>
>> ------------------------- %< -------------------------------
>> static u64
>> set_int_safe_time(struct osnoise_variables *osn_var, u64 *time)
>> {
>>         u64 int_counter;
>>
>>         do {
>>                 int_counter = local_read(&osn_var->int_counter);
>>
>> 		------------> HERE <-------------------------------------
>>
>>                 /* synchronize with interrupts */
>>                 barrier();
>>
>>                 *time = time_get();
>>
>>                 /* synchronize with interrupts */
>>                 barrier();
>>         } while (int_counter != local_read(&osn_var->int_counter));
>>
>>         return int_counter;
>> }
>> ------------------------- >% -------------------------------
>>
>> In this way anything that happens before this *time is accounted before it is
>> get. If anything happens while this loop is running, it will run again, so it is
>> safe to point to the previous case.
>>
>> We would have to make a copy of this function, and only use the copy for the
>> run_osnoise() case. A good name would be something in the lines of
>> set_int_safe_time_rcu().
>>
>> (Unless the expensive is < than 1us.)
> 
> The outermost rcu_read_unlock() could involve a call into the scheduler
> to do an RCU priority deboost, which I would imagine could be a bit
> expensive.  But I would expect you to configure the test in such a way
> that there was no need for RCU priority boosting.  For example, by making
> sure that the osnoise process's RCU readers were never preempted.

So, the noise will not be seeing in the call that Nicolas is doing. but in the
rcu_read_unlock() inside osnoise processes?

If that is the case, then the "noise" would already be accounted to the
previously preempted thread... and we should be fine then.

> 
> Just out of curiosity, why is this running in the kernel rather than in
> userspace?  To focus only on kernel-centric noise sources?  Or are there
> people implementing real-time applications within the kernel?

It is in kernel because it allows me to sync the workload and the trace, getting
more (and more precise) information.

For example, I can read the "noise in time" and how many interrupts happened in
between two reads of the time, so I can look back in the trace to figure out
which sources of noise were the cause of the noise I am seeing - without false
positives. If no "interference" happened, I can safely say that it was a
hardware noise (this saves us time in the debug, no need to run hwlat - I run
two tools in one).

This all with a more cheap access to the data. I also use such information to
parse trace in kernel in a cheaper way, printing less info to the trace buffer.

But the idea is to see the noise for an user-space application as much as
possible (and no, I am not doing application in kernel... but I know people
doing it using a unikernel, but that is another story... a longer one... :-)).

-- Daniel

> 
> 							Thanx, Paul