linux-kernel - Re: [RFC PATCH] tracing: add sched_prio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160706135324.GA8583@sinkpad.internal.efficios.com>
Date:	Wed, 6 Jul 2016 09:53:24 -0400
From:	Julien Desfossez <jdesfossez@...icios.com>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...nel.org>, daolivei@...hat.com,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] tracing: add sched_prio_update

On 06-Jul-2016 09:13:25 AM, Steven Rostedt wrote:
> On Tue, 5 Jul 2016 21:50:34 +0000 (UTC)
> Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
> 
> > >   
> > >> +
> > >> +	TP_PROTO(struct task_struct *tsk),
> > >> +
> > >> +	TP_ARGS(tsk),
> > >> +
> > >> +	TP_STRUCT__entry(
> > >> +		__array( char,	comm,	TASK_COMM_LEN	)  
> > > 
> > > I could imagine this being a high frequency tracepoint, especially with
> > > a lot of boosting going on. Can we nuke the comm recording and let the
> > > userspace tools just hook to the sched_switch tracepoint for that?  
> > 
> > We can surely do that.
> > 
> > Just to clarify: currently this tracepoint is *not* hooked on PI boosting,
> > as described in the changelog. This tracepoint is about the prio attributes
> > set by user-space. The PI boosting temporarily changes the task struct prio
> > without updating the associated policy, which seems rather
> > implementation-specific and odd to expose.
> > 
> > Thoughts ?
> 
> Ah, you're right, I was thinking it was at boosting. But still, it's a
> rather hefty tracepoint (lots of fields), probably want to keep from
> adding comm too.

Yes, I agree we can remove the comm field, it is easy to get from the
previous sched_switch.

> > >> +		__field( pid_t,	pid			)
> > >> +		__field( unsigned int,	policy		)
> > >> +		__field( int,	nice			)
> > >> +		__field( unsigned int,	rt_priority	)
> > >> +		__field( u64,	dl_runtime		)
> > >> +		__field( u64,	dl_deadline		)
> > >> +		__field( u64,	dl_period		)
> > >> +	),
> > >> +
> > >> +	TP_fast_assign(
> > >> +		memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
> > >> +		__entry->pid		= tsk->pid;
> > >> +		__entry->policy		= tsk->policy;
> > >> +		__entry->nice		= task_nice(tsk);
> > >> +		__entry->rt_priority	= tsk->rt_priority;
> > >> +		__entry->dl_runtime	= tsk->dl.dl_runtime;
> > >> +		__entry->dl_deadline	= tsk->dl.dl_deadline;
> > >> +		__entry->dl_period	= tsk->dl.dl_period;
> > >> +	),
> > >> +
> > >> +	TP_printk("comm=%s pid=%d, policy=%s, nice=%d, rt_priority=%u, "
> > >> +			"dl_runtime=%Lu, dl_deadline=%Lu, dl_period=%Lu",
> > >> +			__entry->comm, __entry->pid,
> > >> +			__print_symbolic(__entry->policy, SCHEDULING_POLICY),
> > >> +			__entry->nice, __entry->rt_priority,
> > >> +			__entry->dl_runtime, __entry->dl_deadline,
> > >> +			__entry->dl_period)
> > >> +);
> > >>  #endif /* _TRACE_SCHED_H */
> > >>  
> > >>  /* This part must be outside protection */
> > >> diff --git a/kernel/fork.c b/kernel/fork.c
> > >> index 7926993..ac4294a 100644
> > >> --- a/kernel/fork.c
> > >> +++ b/kernel/fork.c
> > >> @@ -1773,6 +1773,7 @@ long _do_fork(unsigned long clone_flags,
> > >>  		struct pid *pid;
> > >>  
> > >>  		trace_sched_process_fork(current, p);
> > >> +		trace_sched_prio_update(p);
> 
> From the change log:
> 
> "It is emitted in the code path of the sched_setscheduler,
>  sched_setattr, sched_setparam, nice and the fork system calls. For fork, it is emitted
>  after the sched_process_fork tracepoint for timeline consistency and
>  because the PID is not yet set when sched_fork() is called."
> 
> I'm not convinced this should be needed. I hate adding back to back
> tracepoints.

Indeed, having two tracepoints back to back is not pretty. We placed it
here to get the priority of the newly created threads. Maybe a more
appropriate way of doing that would be to extend the sched_process_fork
tracepoint to output the same scheduling informations. Would you prefer
that option ?

Thanks,

Julien