linux-kernel - Re: [RFC PATCH 14/17] sched: Add deadline tracepoints

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <711ff45f008bb4943418c40eba604e83858767ff.camel@redhat.com>
Date: Tue, 19 Aug 2025 12:34:23 +0200
From: Gabriele Monaco <gmonaco@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>
Cc: linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>, 
 Masami Hiramatsu <mhiramat@...nel.org>, Ingo Molnar <mingo@...hat.com>,
 linux-trace-kernel@...r.kernel.org, Nam Cao	 <namcao@...utronix.de>, Tomas
 Glozar <tglozar@...hat.com>, Juri Lelli	 <jlelli@...hat.com>, Clark
 Williams <williams@...hat.com>, John Kacur	 <jkacur@...hat.com>
Subject: Re: [RFC PATCH 14/17] sched: Add deadline tracepoints



On Tue, 2025-08-19 at 12:12 +0200, Peter Zijlstra wrote:
> On Tue, Aug 19, 2025 at 11:56:57AM +0200, Juri Lelli wrote:
> > Hi!
> > 
> > On 14/08/25 17:08, Gabriele Monaco wrote:
> > > Add the following tracepoints:
> > > 
> > > * sched_dl_throttle(dl):
> > >     Called when a deadline entity is throttled
> > > * sched_dl_replenish(dl):
> > >     Called when a deadline entity's runtime is replenished
> > > * sched_dl_server_start(dl):
> > >     Called when a deadline server is started
> > > * sched_dl_server_stop(dl, hard):
> > >     Called when a deadline server is stopped (hard) or put to
> > > idle
> > >     waiting for the next period (!hard)
> > > 
> > > Those tracepoints can be useful to validate the deadline
> > > scheduler with
> > > RV and are not exported to tracefs.
> > > 
> > > Signed-off-by: Gabriele Monaco <gmonaco@...hat.com>
> > > ---
> > >  include/trace/events/sched.h | 55
> > > ++++++++++++++++++++++++++++++++++++
> > >  kernel/sched/deadline.c      |  8 ++++++
> > >  2 files changed, 63 insertions(+)
> > > 
> > > diff --git a/include/trace/events/sched.h
> > > b/include/trace/events/sched.h
> > > index 7b2645b50e78..f34cc1dc4a13 100644
> > > --- a/include/trace/events/sched.h
> > > +++ b/include/trace/events/sched.h
> > > @@ -609,6 +609,45 @@ TRACE_EVENT(sched_pi_setprio,
> > >  			__entry->oldprio, __entry->newprio)
> > >  );
> > >  
> > > +/*
> > > +DECLARE_EVENT_CLASS(sched_dl_template,
> > > +
> > > +	TP_PROTO(struct sched_dl_entity *dl),
> > > +
> > > +	TP_ARGS(dl),
> > > +
> > > +	TP_STRUCT__entry(
> > > +		__field(  struct task_struct
> > > *,	tsk		)
> > > +		__string( comm,		dl->dl_server ?
> > > "server" : container_of(dl, struct task_struct, dl)-
> > > >comm	)
> > > +		__field(  pid_t,	pid		)
> > > +		__field( 
> > > s64,		runtime		)
> > > +		__field(  u64,		deadline	)
> > > +		__field(  int,		dl_yielded	)
> > 
> > I wonder if, while we are at it, we want to print all the other
> > fields
> > as well (they might turn out to be useful). That would be
> > 
> >  .:: static (easier to retrieve with just a trace)
> >  - dl_runtime
> >  - dl_deadline
> >  - dl_period
> > 
> >  .:: behaviour (RECLAIM)
> >  - flags
> > 
> >  .:: state
> >  - dl_ bool flags in addition to dl_yielded
> 
> All these things are used as _tp(). That means they don't have trace
> buffer entries ever, why fill out fields?
> 

Right, that is a relic of the way I put it initially, this whole thing
is commented out (which is indeed confusing and barely noticeable in
the patch).
The tracepoints are in fact not exported to the tracefs and do not use
the print format.

I should have removed this, the real ones are at the bottom of the
file.

> 
> > > +	),
> > > +
> > > +	TP_fast_assign(
> > > +		__assign_str(comm);
> > > +		__entry->pid		= dl->dl_server ? -1 :
> > > container_of(dl, struct task_struct, dl)->pid;
> > > +		__entry->runtime	= dl->runtime;
> > > +		__entry->deadline	= dl->deadline;
> > > +		__entry->dl_yielded	= dl->dl_yielded;
> > > +	),
> > > +
> > > +	TP_printk("comm=%s pid=%d runtime=%lld deadline=%lld
> > > yielded=%d",
> >                                                         ^^^
> > 							llu ?
> > 

As above, this should all go away.

> > > +			__get_str(comm), __entry->pid,
> > > +			__entry->runtime, __entry->deadline,
> > > +			__entry->dl_yielded)
> > > +);
> > 
> > ...
> > 
> > > @@ -1482,6 +1486,7 @@ static void update_curr_dl_se(struct rq
> > > *rq, struct sched_dl_entity *dl_se, s64
> > >  
> > >  throttle:
> > >  	if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
> > > +		trace_sched_dl_throttle_tp(dl_se);
> > >  		dl_se->dl_throttled = 1;
> > 
> > I believe we also need to trace the dl_check_constrained_dl()
> > throttle, please take a look.

Probably yes, strangely I couldn't see failures without it, but it may
be down to my test setup. I'm going to have a look.

> > Also - we discussed this point a little already offline - but I
> > still wonder if we have to do anything special for dl-server defer.
> > Those entities are started as throttled until 0-lag, so maybe we
> > should still trace them explicitly as so?

The naming might need a bit of a consistency check here, but for the
monitor, the server is running, armed or preempted. Before the 0-lag,
it will never be running, so it will stay as armed (fair tasks running)
or preempted (rt tasks running).

armed and preempted have the _throttled version just to indicate an
explicit throttle event occurred.

We might want to start it in the armed_throttled if we are really
guaranteed not to see a throttle event, but I think that would
complicate the model considerably.

We could instead validate the 0-lag concept in a separate, server-
specific model.

Does this initial model feel particularly wrong for the server case?

> > In addition, since it's related, maybe we should do something about
> > sched_switch event, that is currently not aware of deadlines,
> > runtimes, etc.

I'm not sure I follow you here, what relation with switch and
runtime/deadline should we enforce?

We don't really force the switch to occur timely after throttling, is
that what you mean?
Or a switch must occur again timely after replenishing?

> As per the whole _tp() thing, you can attach to the actual
> sched_switch tracepoint with a module and read whatever you want.

Yeah I believe Juri referred to model constraints on the already
existing events rather than new tracepoints here.

Thanks both,
Gabriele