linux-kernel - Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220301194601.GT4285@paulmck-ThinkPad-P17-Gen-1>
Date:   Tue, 1 Mar 2022 11:46:01 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Daniel Bristot de Oliveira <bristot@...nel.org>
Cc:     Nicolas Saenz Julienne <nsaenzju@...hat.com>, rostedt@...dmis.org,
        mingo@...hat.com, linux-kernel@...r.kernel.org, mtosatti@...hat.com
Subject: Re: [PATCH] tracing/osnoise: Force quiescent states while tracing

On Tue, Mar 01, 2022 at 08:29:23PM +0100, Daniel Bristot de Oliveira wrote:
> On 3/1/22 19:58, Paul E. McKenney wrote:
> > On Tue, Mar 01, 2022 at 07:44:38PM +0100, Daniel Bristot de Oliveira wrote:
> >> On 3/1/22 19:05, Paul E. McKenney wrote:
> >>>> I see, as long as it costs < 1 us, I am ok. If it gets > 1us in a reasonably
> >>>> fast machine, we start see HW noise where it does not exist, and that would
> >>>> reduce the resolution of osnoise. AFAICS, it is not causing that problem, but we
> >>>> need to make it as lightweight as possible.
> >>> In the common case, it is atomically incrementing a local per-CPU counter
> >>> and doing a store.  This should be quite cheap.
> >>>
> >>> The uncommon case is when the osnoise process was preempted or otherwise
> >>> interfered with during a recent RCU read-side critical section and
> >>> preemption was disabled around that critical section's outermost
> >>> rcu_read_unlock().  This can be quite expensive.  But I would expect
> >>> you to just not do this.  ;-)
> >>
> >> Getting the expensive call after a preemption is not a problem, it is a side
> >> effect of the most costly preemption.
> >>
> >> It this case, we should "ping rcu" before reading the time to account the
> >> overhead for the previous preemption which caused it.
> >>
> >> like (using the current code as example):
> >>
> >> ------------------------- %< -------------------------------
> >> static u64
> >> set_int_safe_time(struct osnoise_variables *osn_var, u64 *time)
> >> {
> >>         u64 int_counter;
> >>
> >>         do {
> >>                 int_counter = local_read(&osn_var->int_counter);
> >>
> >> 		------------> HERE <-------------------------------------
> >>
> >>                 /* synchronize with interrupts */
> >>                 barrier();
> >>
> >>                 *time = time_get();
> >>
> >>                 /* synchronize with interrupts */
> >>                 barrier();
> >>         } while (int_counter != local_read(&osn_var->int_counter));
> >>
> >>         return int_counter;
> >> }
> >> ------------------------- >% -------------------------------
> >>
> >> In this way anything that happens before this *time is accounted before it is
> >> get. If anything happens while this loop is running, it will run again, so it is
> >> safe to point to the previous case.
> >>
> >> We would have to make a copy of this function, and only use the copy for the
> >> run_osnoise() case. A good name would be something in the lines of
> >> set_int_safe_time_rcu().
> >>
> >> (Unless the expensive is < than 1us.)
> > 
> > The outermost rcu_read_unlock() could involve a call into the scheduler
> > to do an RCU priority deboost, which I would imagine could be a bit
> > expensive.  But I would expect you to configure the test in such a way
> > that there was no need for RCU priority boosting.  For example, by making
> > sure that the osnoise process's RCU readers were never preempted.
> 
> So, the noise will not be seeing in the call that Nicolas is doing. but in the
> rcu_read_unlock() inside osnoise processes?
> 
> If that is the case, then the "noise" would already be accounted to the
> previously preempted thread... and we should be fine then.

It could be either at the rcu_read_unlock() itself, or, if preemption
was disabled across that rcu_read_unlock(), at a subsequent point where
preemption is enabled.  Which might amount to the same thing given that
there won't be any preemption until preemption is enabled?

> > Just out of curiosity, why is this running in the kernel rather than in
> > userspace?  To focus only on kernel-centric noise sources?  Or are there
> > people implementing real-time applications within the kernel?
> 
> It is in kernel because it allows me to sync the workload and the trace, getting
> more (and more precise) information.
> 
> For example, I can read the "noise in time" and how many interrupts happened in
> between two reads of the time, so I can look back in the trace to figure out
> which sources of noise were the cause of the noise I am seeing - without false
> positives. If no "interference" happened, I can safely say that it was a
> hardware noise (this saves us time in the debug, no need to run hwlat - I run
> two tools in one).
> 
> This all with a more cheap access to the data. I also use such information to
> parse trace in kernel in a cheaper way, printing less info to the trace buffer.

Fair enough!

> But the idea is to see the noise for an user-space application as much as
> possible (and no, I am not doing application in kernel... but I know people
> doing it using a unikernel, but that is another story... a longer one... :-)).

There have been people writing their applications in Linux kernel modules,
or at least attempting to do so!  ;-)

							Thanx, Paul