[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250924124107.GJ3419281@noisy.programming.kicks-ass.net>
Date: Wed, 24 Sep 2025 14:41:07 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Xiang Gao <gxxa03070307@...il.com>, mhiramat@...nel.org,
mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
mathieu.desnoyers@...icios.com, andrii@...nel.org, mingo@...nel.org,
oleg@...hat.com, akpm@...ux-foundation.org, gmonaco@...hat.com,
ricardo.neri-calderon@...ux.intel.com, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org,
gaoxiang17 <gaoxiang17@...omi.com>
Subject: Re: [PATCH v2 1/1] tracing/sched: add 'next_policy' to
trace_sched_switch
On Fri, Sep 12, 2025 at 10:30:50AM -0400, Steven Rostedt wrote:
> On Tue, 26 Aug 2025 20:48:54 +0800
> Xiang Gao <gxxa03070307@...il.com> wrote:
>
> > From: gaoxiang17 <gaoxiang17@...omi.com>
> >
> > Sometimes, when analyzing some real-time process issues, it is necessary to know the sched policy.
> >
> > Show up in the trace as:
> >
> > 72.267374: sched_switch: prev_comm=grep prev_pid=67 prev_prio=19 prev_state=S ==> next_comm=cat next_pid=66 next_prio=120 next_policy=normal
> > 72.267594: sched_switch: prev_comm=cat prev_pid=66 prev_prio=120 prev_state=R+ ==> next_comm=grep next_pid=67 next_prio=19 next_policy=RR
> > 562.192567: sched_switch: prev_comm=grep prev_pid=85 prev_prio=19 prev_state=S ==> next_comm=cat next_pid=84 next_prio=120 next_policy=normal
> > 562.192944: sched_switch: prev_comm=cat prev_pid=84 prev_prio=120 prev_state=R+ ==> next_comm=grep next_pid=85 next_prio=19 next_policy=FIFO
> >
>
> Peter,
>
> Are you OK with extending the sched switch tracepoint?
I'm not convinced; this is a bit like whitespace patches, people will
want their favourite field added and before you know it the thing will
be fat as never before.
OTOH changing it will be yet another opportunity to find people that are
not following the recommended practise. Someone will come forward and go
complain we broke their shit or something.
Anyway, if we go touch it, you might as well go do it right and move
prev_prio, next_prio and the polcy things into a u8 [*].
Also, I don't know why we have prev_prio, but if that is useful,
shouldn't we also have prev_policy for consistenty sake?
That said, you can mostly guess the policy from the prio, I mean the
distinction between fair/batch and rr/fifo gets lots, but you can
readily tell the difference between the fair and rt and deadline tasks.
[*] I mean, we do use -1 for dl, and that'll map to 255 in u8, but we
can't really use s8 since MAX_PRIO is 140. u8 is bits plenty, but just a
bit weird *shurg*.
this way the event shrinks by 4 bytes, which is at least somewhat of a
reason to do this.
> > Signed-off-by: gaoxiang17 <gaoxiang17@...omi.com>
> > ---
> > include/trace/events/sched.h | 15 +++++++++++++--
> > 1 file changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> > index 7b2645b50e78..00336211aca6 100644
> > --- a/include/trace/events/sched.h
> > +++ b/include/trace/events/sched.h
> > @@ -234,6 +234,7 @@ TRACE_EVENT(sched_switch,
> > __array( char, next_comm, TASK_COMM_LEN )
> > __field( pid_t, next_pid )
> > __field( int, next_prio )
> > + __field( unsigned int, next_policy )
> > ),
> >
> > TP_fast_assign(
> > @@ -244,10 +245,11 @@ TRACE_EVENT(sched_switch,
> > memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
> > __entry->next_pid = next->pid;
> > __entry->next_prio = next->prio;
> > + __entry->next_policy = next->policy;
> > /* XXX SCHED_DEADLINE */
> > ),
> >
> > - TP_printk("prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==> next_comm=%s next_pid=%d next_prio=%d",
> > + TP_printk("prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==> next_comm=%s next_pid=%d next_prio=%d next_policy=%s",
> > __entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
> >
> > (__entry->prev_state & (TASK_REPORT_MAX - 1)) ?
> > @@ -263,7 +265,16 @@ TRACE_EVENT(sched_switch,
> > "R",
> >
> > __entry->prev_state & TASK_REPORT_MAX ? "+" : "",
> > - __entry->next_comm, __entry->next_pid, __entry->next_prio)
> > + __entry->next_comm, __entry->next_pid, __entry->next_prio,
> > + __print_symbolic(__entry->next_policy,
> > + { SCHED_NORMAL, "normal" },
> > + { SCHED_FIFO, "FIFO" },
> > + { SCHED_RR, "RR" },
> > + { SCHED_BATCH, "batch" },
> > + { SCHED_IDLE, "idle" },
> > + { SCHED_DEADLINE, "deadline" },
> > + { SCHED_EXT, "sched_ext"})
> > + )
> > );
> >
> > /*
>
Powered by blists - more mailing lists