linux-kernel - Re: [PATCH V3 9/9] tracing: Add timerlat tracer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <80f74da1-a9a7-2af9-dd4f-c22f312214df@redhat.com>
Date:   Sat, 12 Jun 2021 10:47:16 +0200
From:   Daniel Bristot de Oliveira <bristot@...hat.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     linux-kernel@...r.kernel.org, Phil Auld <pauld@...hat.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Kate Carcia <kcarcia@...hat.com>,
        Jonathan Corbet <corbet@....net>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Alexandre Chartre <alexandre.chartre@...cle.com>,
        Clark Willaims <williams@...hat.com>,
        John Kacur <jkacur@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>, linux-doc@...r.kernel.org
Subject: Re: [PATCH V3 9/9] tracing: Add timerlat tracer

On 6/11/21 10:48 PM, Steven Rostedt wrote:
> On Fri, 11 Jun 2021 16:13:36 +0200
> Daniel Bristot de Oliveira <bristot@...hat.com> wrote:
> 
>>>> +
>>>> +#ifdef CONFIG_STACKTRACE
>>>> +/*
>>>> + * Stack trace will take place only at IRQ level, so, no need
>>>> + * to control nesting here.
>>>> + */
>>>> +struct trace_stack {
>>>> +	int stack_size;
>>>> +	int nr_entries;
>>>> +	unsigned long           calls[PAGE_SIZE];  
>>>
>>> That is rather big. It's 8 * PAGE_SIZE. I don't think that's what you really
>>> wanted.  
>>
>> no, I did not want that... is 256 a good number?
> 
> Sure. But make it a macro.
> 
> #define MAX_CALLS 256
> 
> or something like that.

Ack!

>>
>>>> +};
>>>> +
>>>> +static DEFINE_PER_CPU(struct trace_stack, trace_stack);
>>>> +
>>>> +/**  
>>>
>>> Again, remove the KernelDoc notation of /**, or make it real kerneldoc
>>> notation.  
>>
>> Fixed!
>>
>> [...]
>>
>>>>   *
>>>> @@ -801,6 +1017,22 @@ void trace_softirq_exit_callback(void *data, unsigned int vec_nr)
>>>>  	if (!osn_var->sampling)
>>>>  		return;
>>>>  
>>>> +#ifdef CONFIG_TIMERLAT_TRACER
>>>> +	/*
>>>> +	 * If the timerlat is enabled, but the irq handler did
>>>> +	 * not run yet enabling timerlat_tracer, do not trace.
>>>> +	 */
>>>> +	if (unlikely(osnoise_data.timerlat_tracer)) {
>>>> +		struct timerlat_variables *tlat_var;
>>>> +		tlat_var = this_cpu_tmr_var();
>>>> +		if (!tlat_var->tracing_thread) {  
>>>
>>> What happens if the timer interrupt triggers here?  
>>
>> The tracer will not report the softirq overhead. But at this point, the softirq
>> is returning, and the duration would be from this time to...
>>
>>
>>
>>>> +			osn_var->softirq.arrival_time = 0;
>>>> +			osn_var->softirq.delta_start = 0;
>>>> +			return;
>>>> +		}
>>>> +	}
>>>> +#endif
>>>> +
>>>>  	duration = get_int_safe_duration(osn_var, &osn_var->softirq.delta_start);  
>>
>> here.
>>
>> We can disable interrupts to avoid this issue. But the question is, is it worth
>> to disable interrupts to avoid this problem?
>>
>>>>  	trace_softirq_noise(vec_nr, osn_var->softirq.arrival_time, duration);
>>>>  	cond_move_thread_delta_start(osn_var, duration);
>>>> @@ -893,6 +1125,18 @@ thread_exit(struct osnoise_variables *osn_var, struct task_struct *t)
>>>>  	if (!osn_var->sampling)
>>>>  		return;
>>>>  
>>>> +#ifdef CONFIG_TIMERLAT_TRACER
>>>> +	if (osnoise_data.timerlat_tracer) {
>>>> +		struct timerlat_variables *tlat_var;
>>>> +		tlat_var = this_cpu_tmr_var();
>>>> +		if (!tlat_var->tracing_thread) {  
>>>
>>> Or here?  
>>
>> The problem that can happen with the softirq cannot happen here: this code runs
>> with interrupts disabled on __schedule() (it is hooked to the sched_switch).
>>
>>>> +			osn_var->thread.delta_start = 0;
>>>> +			osn_var->thread.arrival_time = 0;
>>>> +			return;
>>>> +		}
>>>> +	}
>>>> +#endif
>>>> +
>>>>  	duration = get_int_safe_duration(osn_var, &osn_var->thread.delta_start);
>>>>  
>>>>  	trace_thread_noise(t, osn_var->thread.arrival_time, duration);
>>>> @@ -1182,6 +1426,197 @@ static int osnoise_main(void *data)
>>>>  	return 0;
>>>>  }
>>>>  
>>>> +#ifdef CONFIG_TIMERLAT_TRACER
>>>> +/**
>>>> + * timerlat_irq - hrtimer handler for timerlat.
>>>> + */
>>>> +static enum hrtimer_restart timerlat_irq(struct hrtimer *timer)
>>>> +{
>>>> +	struct osnoise_variables *osn_var = this_cpu_osn_var();
>>>> +	struct trace_array *tr = osnoise_trace;
>>>> +	struct timerlat_variables *tlat;
>>>> +	struct timerlat_sample s;
>>>> +	u64 now;
>>>> +	u64 diff;
>>>> +
>>>> +	/*
>>>> +	 * I am not sure if the timer was armed for this CPU. So, get
>>>> +	 * the timerlat struct from the timer itself, not from this
>>>> +	 * CPU.
>>>> +	 */
>>>> +	tlat = container_of(timer, struct timerlat_variables, timer);
>>>> +
>>>> +	now = ktime_to_ns(hrtimer_cb_get_time(&tlat->timer));
>>>> +
>>>> +	/*
>>>> +	 * Enable the osnoise: events for thread an softirq.
>>>> +	 */
>>>> +	tlat->tracing_thread = true;
>>>> +
>>>> +	osn_var->thread.arrival_time = time_get();
>>>> +
>>>> +	/*
>>>> +	 * A hardirq is running: the timer IRQ. It is for sure preempting
>>>> +	 * a thread, and potentially preempting a softirq.
>>>> +	 *
>>>> +	 * At this point, it is not interesting to know the duration of the
>>>> +	 * preempted thread (and maybe softirq), but how much time they will
>>>> +	 * delay the beginning of the execution of the timer thread.
>>>> +	 *
>>>> +	 * To get the correct (net) delay added by the softirq, its delta_start
>>>> +	 * is set as the IRQ one. In this way, at the return of the IRQ, the delta
>>>> +	 * start of the sofitrq will be zeroed, accounting then only the time
>>>> +	 * after that.
>>>> +	 *
>>>> +	 * The thread follows the same principle. However, if a softirq is
>>>> +	 * running, the thread needs to receive the softirq delta_start. The
>>>> +	 * reason being is that the softirq will be the last to be unfolded,
>>>> +	 * resseting the thread delay to zero.
>>>> +	 */
>>>> +#ifndef CONFIG_PREEMPT_RT
>>>> +	if (osn_var->softirq.delta_start) {
>>>> +		copy_int_safe_time(osn_var, &osn_var->thread.delta_start,
>>>> +				   &osn_var->softirq.delta_start);  
>>>
>>> Isn't softirq.delta_start going to be zero here? It doesn't look to get
>>> updated until you set tracing_thread to true, but that happens here, and as
>>> this is in a interrupt context, there will not be a softirq happening
>>> between the setting of that to true to this point.  
>>
>> No... on the timerlat, the "sampling" is always on. And the
>> osnoise_data.timerlat_tracer is only checked at the softirq return, so the
>> softirq entry always set set the delta_start.
> 
> OK, I was confused by the timerlat using the "__osnoise_tracer_start()". If
> timerlat is going to use that, perhaps we need to rename it, because the
> "osnoise" is one tracer, and its confusing that the "timerlat" is using
> functions called "*_osnoise_*". I was thinking that those functions were
> only for the osnoise tracer and not part of the timerlat tracer, and
> ignored them when looking at what the timerlat tracer was doing.
> 
> Can we rename that to simply "start_latency_tracing()" or something more
> generic.

right, also considering that we will (I hope) have the rtsl next, and other
ideas coming around the usage of osnoise: events on other tracers, I also think
should a more generic term to the events. Indeed, on rtsl they have a different
name.

Thinking only about the instrumentation/events, what they are tracking is the
execution time. So how about naming them as:

exec_time:thread
exec_time:irq

Also adding that, although here we measure the execution time of "task" context,
on rtsl we have other kinds of "windows" that they measure, for instance, the
poid window (Preemption or IRQ disabled). So, the term exec time also fits there.

?

>>
>>>> +
>>>> +		copy_int_safe_time(osn_var, &osn_var->softirq.delta_start,
>>>> +				    &osn_var->irq.delta_start);
>>>> +	} else {
>>>> +		copy_int_safe_time(osn_var, &osn_var->thread.delta_start,
>>>> +				    &osn_var->irq.delta_start);
>>>> +	}
>>>> +#else /* CONFIG_PREEMPT_RT */
>>>> +	/*
>>>> +	 * The sofirqs run as threads on RT, so there is not need
>>>> +	 * to keep track of it.
>>>> +	 */
>>>> +	copy_int_safe_time(osn_var, &osn_var->thread.delta_start, &osn_var->irq.delta_start);
>>>> +#endif /* CONFIG_PREEMPT_RT */
>>>> +
>>>> +	/*
>>>> +	 * Compute the current time with the expected time.
>>>> +	 */
>>>> +	diff = now - tlat->abs_period;
>>>> +
>>>> +	tlat->count++;
>>>> +	s.seqnum = tlat->count;
>>>> +	s.timer_latency = diff;
>>>> +	s.context = IRQ_CONTEXT;
>>>> +
>>>> +	trace_timerlat_sample(&s);
>>>> +
>>>> +	/* Keep a running maximum ever recorded os noise "latency" */
>>>> +	if (diff > tr->max_latency) {
>>>> +		tr->max_latency = diff;
>>>> +		latency_fsnotify(tr);
>>>> +	}
>>>> +
>>>> +	if (osnoise_data.stop_tracing_in)
>>>> +		if (time_to_us(diff) >= osnoise_data.stop_tracing_in)
>>>> +			osnoise_stop_tracing();
>>>> +
>>>> +	wake_up_process(tlat->kthread);
>>>> +
>>>> +#ifdef CONFIG_STACKTRACE
>>>> +	if (osnoise_data.print_stack)
>>>> +		timerlat_save_stack(0);
>>>> +#endif  
>>>
>>> No need for the #ifdef above. timerlat_save_stack() is defined as a nop
>>> when not enabled, and the compiler will just optimize this out.  
>>
>> The osnoise_data.print_stack is ifdefed, should I remove it from ifdef?
> 
> Well, the above ifdef is for STACKTRACE not for TIMERLAT_TRACER, which
> encompasses all of this. And the "timerlat_save_stack()" is a nop when
> STACKTRACE is not defined. So no.

Ooooppppsss, right, miss attention from my side.

-- Daniel

> -- Steve
> 
>>
>>>   
>>>> +
>>>> +	return HRTIMER_NORESTART;
>>>> +}
>>>> +
>