lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon,  6 May 2019 00:39:15 +0200
From:   Viktor Rosendahl <viktor.rosendahl@...il.com>
To:     Joel Fernandes <joel@...lfernandes.org>
Cc:     Steven Rostedt <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
        Viktor Rosendahl <viktor.rosendahl@...il.com>
Subject: Re: [PATCH v2 1/4] ftrace: Implement fs notification for preempt/irqsoff tracers

On 5/4/19 6:47 PM, Joel Fernandes wrote:
> On Wed, May 01, 2019 at 10:36:47PM +0200, Viktor Rosendahl wrote:
>> immediately after each other in spite of the fact that the
>> preempt/irqsoff tracers operate in overwrite mode.
>>
>> Signed-off-by: Viktor Rosendahl <viktor.rosendahl@...il.com>
>
> I agree with the general idea, but I don't really like how it is done in the
> patch.
>

Can you explain more precisely what you agree with?

The general idea of being able to trace bursts of latencies?

Or the slightly more specific idea of notifying user space when
/sys/kernel/debug/tracing/trace has new data, and let user space do the rest?

> We do have a notification mechanism already in the form of trace_pipe. Can we
> not improve that in some way to be notified of a new trace data? In theory,
> the trace_pipe does fit into the description in the documentation: "Reads
> from this file will block until new data is retrieved"
>

I am not quite sure what kind of solution you are after here. Could you be more
specific?

I think that it would be weird if we used trace_pipe to send a message with
the meaning "new data is available in the trace file". To me this would seem
like (re)inventing a special purpose alternative to inotify.

Another option would be to allow the user to consume the the latency traces
directly from trace_pipe, without using the trace file. This would make sense
to me and would indeed be a much better solution. If somebody is able to make
such an implementation I am all for it.

However, I do not know how to do this. I believe that it would lead towards a
significant rewrite of the latency tracers, probably also how the ftrace
buffering works.

My solution is clearly a hack but it has the benefits of requiring quite
small and straightforward changes to the kernel and the really ugly things are
then done in user space.

>>
>> +	config PREEMPTIRQ_FSNOTIFY

> Does this have to be a CONFIG option? If prefer if the code automatically
> does the notification and it is always enabled. I don't see any drawbacks of
> that.
>

I was just thinking that sending fsnotify events to a sysfs file is a bit off
the beaten track and I figured that some people would like to opt out.

It can of course be changed so that it is always enabled when a tracer that
requires it is enabled, if there is general agreement on this point.

>> +
>>  config SCHED_TRACER
>>  	bool "Scheduling Latency Tracer"
>>  	select GENERIC_TRACER
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index ca1ee656d6d8..ebefb8d4e072 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -44,6 +44,8 @@
>>  #include <linux/trace.h>
>>  #include <linux/sched/clock.h>
>>  #include <linux/sched/rt.h>
>> +#include <linux/fsnotify.h>
>> +#include <linux/workqueue.h>
>>
>>  #include "trace.h"
>>  #include "trace_output.h"
>> @@ -8191,6 +8193,32 @@ static __init void create_trace_instances(struct dentry *d_tracer)
>>  		return;
>>  }
>>
>> +#ifdef CONFIG_PREEMPTIRQ_FSNOTIFY
>> +
>> +static void trace_notify_workfn(struct work_struct *work)
> [snip]
>
> I prefer if this facility is available to other tracers as well such as
> the wakeup tracer which is similar in output (check
> Documentation/trace/ftrace.txt). I believe this should be a generic trace
> facility, and not tracer specific.
>

This should be possible.

If we stick with the fsnotify idea, it's possible to move the functions for it
to trace.c and to use it also from other tracers.

It is also possible to do a few small adjustments to the latency-collector and
to rename it the trace-collector, so that it can work for the wakeup case also.
The random sleep games probably don't make sense for the wakeup case but it's
already now a command line option.

Below is a suggestion for how to make the facility more generic and usable by
by both preempt/irqsoff and wakeup.

best regards,

Viktor
---
 kernel/trace/Kconfig              | 11 +++++++
 kernel/trace/trace.c              | 50 +++++++++++++++++++++++++++++--
 kernel/trace/trace.h              | 15 ++++++++++
 kernel/trace/trace_irqsoff.c      |  5 ++++
 kernel/trace/trace_sched_wakeup.c |  7 ++++-
 5 files changed, 85 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 8bd1d6d001d7..ca6a63004adf 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -244,6 +244,17 @@ config SCHED_TRACER
 	  This tracer tracks the latency of the highest priority task
 	  to be scheduled in, starting from the point it has woken up.
 
+config TRACE_FSNOTIFY
+	bool "Generate fsnotify events for the trace file"
+	default n
+	depends on (IRQSOFF_TRACER || PREEMPT_TRACER || SCHED_TRACER)
+	depends on FSNOTIFY
+	help
+	  This option will enable the generation of fsnotify events for the
+	  trace file. This makes it possible for userspace to be notified about
+	  modification of /sys/kernel/debug/tracing/trace through the inotify
+	  interface.
+
 config HWLAT_TRACER
 	bool "Tracer to detect hardware latencies (like SMIs)"
 	select GENERIC_TRACER
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ca1ee656d6d8..1640fd302e9a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -44,6 +44,8 @@
 #include <linux/trace.h>
 #include <linux/sched/clock.h>
 #include <linux/sched/rt.h>
+#include <linux/fsnotify.h>
+#include <linux/workqueue.h>
 
 #include "trace.h"
 #include "trace_output.h"
@@ -8191,6 +8193,50 @@ static __init void create_trace_instances(struct dentry *d_tracer)
 		return;
 }
 
+#ifdef CONFIG_TRACE_FSNOTIFY
+
+static struct workqueue_struct *notify_wq;
+
+static void trace_notify_workfn(struct work_struct *work)
+{
+	struct trace_array *tr = container_of(work, struct trace_array,
+					      notify_work);
+	fsnotify(tr->d_trace->d_inode, FS_MODIFY, tr->d_trace->d_inode,
+		 FSNOTIFY_EVENT_INODE, NULL, 0);
+}
+
+static void trace_create_trace_file(struct trace_array *tr,
+				    struct dentry *d_tracer)
+{
+	/* For notify we need to init the work structure and save the pointer */
+	INIT_WORK(&tr->notify_work, trace_notify_workfn);
+	tr->d_trace = trace_create_file("trace", 0644, d_tracer, tr,
+					&tracing_fops);
+}
+
+static __init void trace_file_notify_init(void)
+{
+	notify_wq = alloc_workqueue("irqsoff_notify_wq",
+				    WQ_UNBOUND | WQ_HIGHPRI, 0);
+	if (!notify_wq)
+		pr_err("Unable to allocate irqsoff_notify_wq");
+}
+
+void trace_file_notify(struct trace_array *tr)
+{
+	if (likely(notify_wq))
+		queue_work(notify_wq, &tr->notify_work);
+}
+
+#else /* !CONFIG_TRACE_FSNOTIFY */
+
+#define trace_create_trace_file(tr, d_tracer) \
+	trace_create_file("trace", 0644, d_tracer, tr, &tracing_fops)
+
+#define trace_file_notify_init() do {} while (0)
+
+#endif /* !CONFIG_TRACE_FSNOTIFY */
+
 static void
 init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
 {
@@ -8209,8 +8255,7 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer)
 	trace_create_file("trace_options", 0644, d_tracer,
 			  tr, &tracing_iter_fops);
 
-	trace_create_file("trace", 0644, d_tracer,
-			  tr, &tracing_fops);
+	trace_create_trace_file(tr, d_tracer);
 
 	trace_create_file("trace_pipe", 0444, d_tracer,
 			  tr, &tracing_pipe_fops);
@@ -8415,6 +8460,7 @@ static __init int tracer_init_tracefs(void)
 {
 	struct dentry *d_tracer;
 
+	trace_file_notify_init();
 	trace_access_lock_init();
 
 	d_tracer = tracing_init_dentry();
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index d80cee49e0eb..46f20c9bcf36 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -17,6 +17,7 @@
 #include <linux/compiler.h>
 #include <linux/trace_seq.h>
 #include <linux/glob.h>
+#include <linux/workqueue.h>
 
 #ifdef CONFIG_FTRACE_SYSCALLS
 #include <asm/unistd.h>		/* For NR_SYSCALLS	     */
@@ -302,6 +303,10 @@ struct trace_array {
 	struct dentry		*options;
 	struct dentry		*percpu_dir;
 	struct dentry		*event_dir;
+#ifdef CONFIG_TRACE_FSNOTIFY
+	struct dentry		*d_trace;
+	struct work_struct	notify_work;
+#endif
 	struct trace_options	*topts;
 	struct list_head	systems;
 	struct list_head	events;
@@ -774,6 +779,16 @@ int trace_pid_write(struct trace_pid_list *filtered_pids,
 		    struct trace_pid_list **new_pid_list,
 		    const char __user *ubuf, size_t cnt);
 
+#ifdef CONFIG_TRACE_FSNOTIFY
+
+void trace_file_notify(struct trace_array *tr);
+
+#else /* !CONFIG_TRACE_FSNOTIFY */
+
+#define trace_file_notify(tr) do {} while (0)
+
+#endif /* !CONFIG_TRACE_FSNOTIFY */
+
 #ifdef CONFIG_TRACER_MAX_TRACE
 void update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu,
 		   void *cond_data);
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index a745b0cee5d3..911bb635fb4e 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -323,6 +323,7 @@ check_critical_timing(struct trace_array *tr,
 	u64 T0, T1, delta;
 	unsigned long flags;
 	int pc;
+	bool notify = false;
 
 	T0 = data->preempt_timestamp;
 	T1 = ftrace_now(cpu);
@@ -353,6 +354,7 @@ check_critical_timing(struct trace_array *tr,
 	if (likely(!is_tracing_stopped())) {
 		tr->max_latency = delta;
 		update_max_tr_single(tr, current, cpu);
+		notify = true;
 	}
 
 	max_sequence++;
@@ -364,6 +366,9 @@ check_critical_timing(struct trace_array *tr,
 	data->critical_sequence = max_sequence;
 	data->preempt_timestamp = ftrace_now(cpu);
 	__trace_function(tr, CALLER_ADDR0, parent_ip, flags, pc);
+
+	if (unlikely(notify))
+		trace_file_notify(tr);
 }
 
 static nokprobe_inline void
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index 743b2b520d34..c87e901108b9 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -437,6 +437,8 @@ probe_wakeup_sched_switch(void *ignore, bool preempt,
 	long disabled;
 	int cpu;
 	int pc;
+	bool report = false;
+	struct trace_array *tr = wakeup_trace;
 
 	tracing_record_cmdline(prev);
 
@@ -481,7 +483,8 @@ probe_wakeup_sched_switch(void *ignore, bool preempt,
 	T1 = ftrace_now(cpu);
 	delta = T1-T0;
 
-	if (!report_latency(wakeup_trace, delta))
+	report = report_latency(wakeup_trace, delta);
+	if (!report)
 		goto out_unlock;
 
 	if (likely(!is_tracing_stopped())) {
@@ -495,6 +498,8 @@ probe_wakeup_sched_switch(void *ignore, bool preempt,
 	local_irq_restore(flags);
 out:
 	atomic_dec(&per_cpu_ptr(wakeup_trace->trace_buffer.data, cpu)->disabled);
+	if (unlikely(report))
+		trace_file_notify(tr);
 }
 
 static void __wakeup_reset(struct trace_array *tr)
-- 
2.17.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ