linux-kernel - Re: [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250716154714.GZ1613200@noisy.programming.kicks-ass.net>
Date: Wed, 16 Jul 2025 17:47:14 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Gabriele Monaco <gmonaco@...hat.com>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	linux-trace-kernel@...r.kernel.org, Nam Cao <namcao@...utronix.de>,
	Tomas Glozar <tglozar@...hat.com>, Juri Lelli <jlelli@...hat.com>,
	Clark Williams <williams@...hat.com>,
	John Kacur <jkacur@...hat.com>
Subject: Re: [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model

On Wed, Jul 16, 2025 at 05:09:43PM +0200, Gabriele Monaco wrote:
> On Wed, 2025-07-16 at 14:38 +0200, Peter Zijlstra wrote:
> > On Tue, Jul 15, 2025 at 09:14:29AM +0200, Gabriele Monaco wrote:
> > > Add the following tracepoints:
> > > * sched_set_need_resched(tsk, cpu, tif)
> > >     Called when a task is set the need resched [lazy] flag
> > > * sched_switch_vain(preempt, tsk, tsk_state)
> > >     Called when a task is selected again during __schedule
> > >     i.e. prev == next == tsk : no real context switch
> > 
> > > @@ -6592,6 +6598,7 @@ static bool try_to_block_task(struct rq *rq,
> > > struct task_struct *p,
> > >  	int flags = DEQUEUE_NOCLOCK;
> > >  
> > >  	if (signal_pending_state(task_state, p)) {
> > > +		trace_sched_set_state_tp(p, TASK_RUNNING, true);
> > >  		WRITE_ONCE(p->__state, TASK_RUNNING);
> > >  		*task_state_p = TASK_RUNNING;
> > >  		return false;
> > 
> > I'm confused on the purpose of this. How does this relate to say the
> > wakeup in signal_wake_up_state() ?
> 
> Also this adds more context: models like sssw (in this series) expect
> that, after a task is set to sleepable, it either goes to sleep or is
> woken up/set to runnable.
> 
> In this specific case, the task is set to runnable without tracing it,
> so the model doesn't know what happened, since it may not see a wakeup
> after that (the task is already runnable).
> 
> Now I'm not sure if there are other events that we are guaranteed to
> see to reconstruct this specific case (at some point we should see the
> signal, I assume).
> This just simplified things as that is the only state change that was
> not traced.
> 
> Am I missing anything obvious here?


AFAICT this is just a normal wakeup race.

Given:

  for (;;) {
    set_current_state(TASK_UNINTERRUPTIBLE);
    if (COND)
      break;
    schedule();
  }
  __set_current_state(TASK_RUNNING);

vs

  COND=1;
  wake_up_state(p, TASK_UNINTERRUPTIBLE);

The race is seeing COND before or after hitting schedule(). With
interruptible this simply becomes:


  for (;;) {
    set_current_state(TASK_INTERRUPTIBLE);
    if (SIGPENDING || COND)
      break;
    schedule();
  }
  __set_current_state(TASK_RUNNING);

vs

  COND=1;
  wake_up_state(p, TASK_INTERRUPTIBLE);

vs

  set_tsk_thread_flag(p, TIF_SIGPENDING);
  wake_up_state(p, TASK_INTERRUPTIBLE);


(same with KILLABLE, but for fatal signals only)
(except the signal thing will often exit with -EINTR / -ERESTARTSYS, but
that should matter here, right?)

How is the race for seeing SIGPENDING different from seeing COND?

Both will issue a wakeup; except in one case the wakeup is superfluous
because schedule didn't end up blocking because it already saw the
condition, while in the other case it did block and the wakeup is indeed
needed.

Anyway, I don't mind tracing it, but I am confused on that new (3rd)
argument to trace_sched_set_state_tp().